The
memory wall is the growing disparity of speed between CPU and the response time of memory (known as
memory latency) outside the CPU chip. An important reason for this disparity is the limited communication bandwidth beyond chip boundaries. From 1986 to 2000,
CPU speed improved at an annual rate of 55% while off-chip memory response time only improved at 10%. Given these trends, it was expected that memory latency would become an overwhelming
bottleneck in computer performance. Another reason for the disparity is the enormous increase in the size of memory since the start of the PC revolution in the 1980s. Originally, PCs contained less than 1
megabyte of RAM, which often had a response time of 1 CPU clock cycle, meaning that it required 0
wait states. Larger memory units are inherently slower than smaller ones of the same type, simply because it takes longer for signals to traverse a larger circuit. Constructing a memory unit of many gigabytes with a response time of one clock cycle is difficult or impossible. Modern CPUs often still have 0 wait state cache memory, but, due to the bandwidth limitations of chip-to-chip communication, it must reside on the same chip as the CPU cores. It must also be constructed from static RAM, which is far more expensive than the dynamic RAM used for larger memories. CPU speed improvements slowed significantly, partly due to major physical barriers and partly because CPU designs have already hit the memory wall in some sense.
Intel summarized these causes in a 2005 document. First of all, as chip geometries shrink and clock frequencies rise, the transistor
leakage current increases, leading to excess power consumption and heat... Secondly, the advantages of higher clock speeds are in part negated by memory latency, since memory access times have not been able to keep pace with increasing clock frequencies. Third, for certain applications, traditional serial architectures are becoming less efficient as processors get faster (due to the so-called
von Neumann bottleneck), further undercutting any gains that frequency increases might otherwise buy. In addition, partly due to limitations in the means of producing inductance within solid state devices,
resistance-capacitance (RC) delays in signal transmission are growing as feature sizes shrink, imposing an additional bottleneck that frequency increases don't address. The RC delays in signal transmission were also noted in "Clock Rate versus IPC: The End of the Road for Conventional Microarchitectures" which projected a maximum of 12.5% average annual CPU performance improvement between 2000 and 2014. A different concept is the processor-memory performance gap, which can be addressed by
3D integrated circuits that reduce the distance between the logic and memory aspects that are further apart in a 2D chip. Memory subsystem design requires a focus on the gap, which is widening over time. The main method of bridging the gap is the use of
caches; small amounts of high-speed memory that houses recent operations and instructions nearby the processor, speeding up the execution of those operations or instructions in cases where they are called upon frequently. Multiple levels of caching have been developed to deal with the widening gap, and the performance of high-speed modern computers relies on evolving caching techniques. There can be up to a 53% difference between the growth in speed of processor and the lagging speed of main memory access.
Solid-state hard drives have continued to increase in speed, from ~400 Mbit/s via
SATA3 in 2012 up to ~7 GB/s via
NVMe/
PCIe in 2024, closing the gap between RAM and hard disk speeds, although RAM continues to be an order of magnitude faster, with single-lane
DDR5 8000MHz capable of 128 GB/s, and modern
GDDR even faster. Fast, cheap,
non-volatile solid state drives have replaced some functions formerly performed by RAM, such as holding certain data for immediate availability in
server farms - 1
terabyte of SSD storage can be had for $200, while 1 TB of RAM would cost thousands of dollars. ==Timeline==