With the problem of general mutual exclusion solved, some special cases were brought up. For example, consider the situation where the critical section is one where access (reading and writing) to a shared data structure is made. There will be some processes that are interested in writing to that structure, and some that are interested in reading. We certainly want to protect the data structure from concurrent writes, and we want to insure that reads don't happen in the middle of a write. But is it reasonable to forbid concurrent reads? In 1971, Courtois, Heymans, and Parnas  proposed two algorithms that allowed concurrent reads while ensuring that writes are exclusive from each other and from reads. One solution attempted to have a minimal delay for reads, and the other to have a minimal delay for writes.
The first solution was quite simple and ingenious: simply treat all readers as one process. The first reader who wants entry must get the main semaphore. While at least one reader is in the critical section, other readers are allowed free entry and exit. The last reader to exit releases the semaphore and allows entry to a waiting writer. This solution has the fault that a writer may be blocked indefinitely while an infinite number of readers stream though the critical code. Their second solution reverses this by allowing readers to block indefinitely while writers get immediate access to the data.
When dealing with multiprocessor systems, it is important to re-evaluate the solutions that we use in uniprocessor systems. For example, mutual-exclusion is a reasonable solution in uniprocessor systems because only one process can execute at any one time, so mutual exclusion really only specifies their order of scheduling. However, on multiprocessor systems, mutual exclusion is an expensive operation, not only because of the overhead involved, but because one processor will sit idle while the other has access to the data. This should motivate some algorithms that allow readers and writers to access the data without forcing either to block. Even a partitial solution where only writers never block would be useful for large data structures where an update (write) might take some time and should be allowed immediately.
Leslie Lamport suggested an algorithm  which would always allow a single writer to write without blocking. The basic idea was to allow the writer to write at any time, and the reader to repeatedly read the data until it is sure that it has a valid copy. The writer should write data version numbers both before and after the update, and the reader should read them in reverse order and compare them. Pseudo-code may help clarify the algorithm:
/* Algorithm for writer */ /* Algorithm for reader */ v1 := v1 + 1; repeat temp := v2; write data; read data; v2 := v1; until v1 = temp;
This assumes that the version numbers themselves can be written and read atomically. If they cannot, then they must be written in such a way as to guarantee that the reader will get a copy of v1 which is greater than or equal to the actual copy, and a version of v2 which is less than or equal to the actual copy. Lamport showed that if v1 is written most significant digit first and read least significant digit first, and if v2 is written and read in the opposite order, then the above propreties will hold and an atomic write is not necessary. (He does assume that, at some level, an atomic write is possible, even if this write is at the level of a single bit.)
When mutual exclusion is the only option, we are then faced with the potentially large delays incurred in the entry section, particularly in large distributed systems. It became the belief of operating system designers that the great majority of processes that entered the entry section were allowed into the critical section without delay; i.e., that contention was very low. With this in mind, several algorithms were developed that tried to reduce the access time of the early mutual exclusion algorithms. Leslie Lamport once again led the way with his ``fast'' mutual exclusion algorithm , which only required seven memory accesses in the event of no contention.
It was then noticed that most distributed systems were set up so that each processor could quickly access its local memory and slowly access other processors' memories. The interconnect network was usually shared by all processors, and remote spin-locking seriously affected communication among other processors, even those that were not attempting entry into any critical section. Attention turned to minimizing remote memory accesses, and Mellor-Crummey and Scott developed an algorithm  that had remote references.
In the late 1980's, multiprocessor systems made from ordinary CPU's became popular, and since these CPU's often did not have atomic test-and-set operations, operating system designers were once again faced with having to write mutual-exclusion algorithms from atomic reads and writes. Yang and Anderson  developed a algorithm which only required atomic reads and writes. In fact, even under heavy contention, performance tests showed that it rivaled algorithms that used hardware-supported atomic operations.
The delay in the entry section is a particularly important issue in message-passing systems, and several papers have focused on trying to reduce this overhead. One of the firsts was Ricart and Agrawala's solution  which used messages: N-1 messages to ask for permission and N-1 to grant it. This algorithm required consent from all of the other processors. Thomas  recognized that only a majority of the processors needed to consend, and modified the algorithm to reduce the number of messages by half. Maekawa followed with a solution  in which groups of processors were created and each processor was in groups; this was a variation on the old divide-and-conquer idea.