When a program runs on a single-CPU machine, the hardware performs the necessary bookkeeping to ensure that the program executes as if all memory operations were performed in the order specified by the programmer (program order), so memory barriers are not necessary. However, when the memory is shared with multiple devices, such as other CPUs in a multiprocessor system, or
memory-mapped peripherals, out-of-order access may affect program behavior. For example, a second CPU may see memory changes made by the first CPU in a sequence that differs from program order. A program is run via a process which can be multi-threaded (i.e. a software thread such as
pthreads as opposed to a hardware thread). Different processes do not share a memory space so this discussion does not apply to two programs, each one running in a different process (hence a different memory space). It applies to two or more (software) threads running in a single process (i.e. a single memory space where multiple software threads share a single memory space). Multiple software threads, within a single process, may run
concurrently on a
multi-core processor. The following multi-threaded program, running on a multi-core processor gives an example of how such out-of-order execution can affect program behavior: Initially, memory locations and both hold the value . The software thread running on processor #1 loops while the value of is zero, then it prints the value of . The software thread running on processor #2 stores the value into and then stores the value into . Pseudo-code for the two program fragments is shown below. The steps of the program correspond to individual processor instructions. Thread #1 Core #1: while (f == 0) continue; // Memory fence required here println(x); Thread #2 Core #2: x = 42; // Memory fence required here f = 1; One might expect the print statement to always print the number ""; however, if thread #2's store operations are executed out-of-order, it is possible for to be updated , and the print statement might therefore print "". Similarly, thread #1's load operations may be executed out-of-order and it is possible for to be read is checked, and again the print statement might therefore print an unexpected value. For most programs neither of these situations is acceptable. A memory barrier must be inserted before thread #2's assignment to to ensure that the new value of is visible to other processors at or prior to the change in the value of . Another important point is a memory barrier must also be inserted before thread #1's access to to ensure the value of is not read prior to seeing the change in the value of . Another example is when a driver performs the following sequence: //prepare data for a hardware module // Memory fence required here //trigger the hardware module to process the data If the processor's store operations are executed out-of-order, the hardware module may be triggered before data is ready in memory. For another illustrative example (a non-trivial one that arises in actual practice), see
double-checked locking. In the case of the PowerPC processor, the eieio ("Enforce In-order Execution of I/O") instruction ensures, as memory fence, that any load or store operations previously initiated by the processor are fully completed with respect to the main memory before any subsequent load or store operations initiated by the processor access the main memory. In the case of the
ARM architecture family, the , and instructions are used. In the case of the RISC-V architecture, the instruction is used. In the case of the x86 architecture, the , , and instructions are used. ==Multithreaded programming and memory visibility==