Counters To demonstrate the power and necessity of linearizability we will consider a simple counter which different processes can increment. We would like to implement a counter object which multiple processes can access. Many common systems make use of counters to keep track of the number of times an event has occurred. The counter object can be accessed by multiple processes and has two available operations. • Increment - adds 1 to the value stored in the counter, return acknowledgement • Read - returns the current value stored in the counter without changing it. We will attempt to implement this counter object using
shared registers. Our first attempt which we will see is non-linearizable has the following implementation using one shared register among the processes.
Non-atomic The naive, non-atomic implementation:
Increment: • Read the value in the register R • Add one to the value • Writes the new value back into register R
Read: Read register R This simple implementation is not linearizable, as is demonstrated by the following example. Imagine two processes are running accessing the single counter object initialized to have value 0: • The first process reads the value in the register as 0. • The first process adds one to the value, the counter's value should be 1, but before it has finished writing the new value back to the register it may become suspended, meanwhile the second process is running: • The second process reads the value in the register, which is still equal to 0; • The second process adds one to the value; • The second process writes the new value into the register, the register now has value 1. The second process is finished running and the first process continues running from where it left off: • The first process writes 1 into the register, unaware that the other process has already updated the value in the register to 1. In the above example, two processes invoked an increment command, however the value of the object only increased from 0 to 1, instead of 2 as it should have. One of the increment operations was lost as a result of the system not being linearizable. The above example shows the need for carefully thinking through implementations of data structures and how linearizability can have an effect on the correctness of the system.
Atomic To implement a linearizable or atomic counter object we will modify our previous implementation so
each process Pi will use its own register Ri. This is similar to how
grow-only counter CRDTs work. Each process increments and reads according to the following algorithm:
Increment: • Read value in register Ri. • Add one to the value. • Write new value back into Ri
Read: • Read registers R1, R2, ... Rn. • Return the sum of all registers. This implementation solves the problem with our original implementation. In this system the increment operations are linearized at the write step. The linearization point of an increment operation is when that operation writes the new value in its register Ri. The read operations are linearized to a point in the system when the value returned by the read is equal to the sum of all the values stored in each register Ri. This is a trivial example. In a real system, the operations can be more complex and the errors introduced extremely subtle. For example, reading a
64-bit value from memory may actually be implemented as two
sequential reads of two
32-bit memory locations. If a process has only read the first 32 bits, and before it reads the second 32 bits the value in memory gets changed, it will have neither the original value nor the new value but a mixed-up value. Furthermore, the specific order in which the processes run can change the results, making such an error difficult to detect, reproduce and
debug.
Compare-and-swap Most systems provide an atomic compare-and-swap instruction that reads from a memory location, compares the value with an "expected" one provided by the user, and writes out a "new" value if the two match, returning whether the update succeeded. We can use this to fix the non-atomic counter algorithm as follows: :# Read the value in the memory location; :# add one to the value; :# use compare-and-swap to write the incremented value back; :# retry if the value read in by the compare-and-swap did not match the value we originally read. Since the compare-and-swap occurs (or appears to occur) instantaneously, if another process updates the location while we are in-progress, the compare-and-swap is guaranteed to fail.
Fetch-and-increment Many systems provide an atomic fetch-and-increment instruction that reads from a memory location, unconditionally writes a new value (the old value plus one), and returns the old value. We can use this to fix the non-atomic counter algorithm as follows: :# Use fetch-and-increment to read the old value and write the incremented value back. Using fetch-and increment is always better (requires fewer memory references) for some algorithms—such as the one shown here—than compare-and-swap, even though Herlihy earlier proved that compare-and-swap is better for certain other algorithms that can't be implemented at all using only fetch-and-increment. So
CPU designs with both fetch-and-increment and compare-and-swap (or equivalent instructions) may be a better choice than ones with only one or the other.
Locking Another approach is to turn the naive algorithm into a
critical section, preventing other threads from disrupting it, using a
lock. Once again fixing the non-atomic counter algorithm: :# Acquire a lock, excluding other threads from running the critical section (steps 2–4) at the same time; :# read the value in the memory location; :# add one to the value; :# write the incremented value back to the memory location; :# release the lock. This strategy works as expected; the lock prevents other threads from updating the value until it is released. However, when compared with direct use of atomic operations, it can suffer from significant overhead due to lock contention. To improve program performance, it may therefore be a good idea to replace simple critical sections with atomic operations for
non-blocking synchronization (as we have just done for the counter with compare-and-swap and fetch-and-increment), instead of the other way around, but unfortunately a significant improvement is not guaranteed and lock-free algorithms can easily become too complicated to be worth the effort. == See also ==