Hardware implements cache as a
block of memory for temporary storage of data likely to be used again.
Central processing units (CPUs),
solid-state drives (SSDs) and hard disk drives (HDDs) frequently include hardware-based cache, while
web browsers and
web servers commonly rely on software caching. A cache is made up of a pool of entries. Each entry has associated
data, which is a copy of the same data in some
backing store. Each entry also has a
tag, which specifies the identity of the data in the backing store of which the entry is a copy. When the cache client (a CPU, web browser,
operating system) needs to access data presumed to exist in the backing store, it first checks the cache. If an entry can be found with a tag matching that of the desired data, the data in the entry is used instead. This situation is known as a
cache hit. For example, a web browser program might check its local cache on disk to see if it has a local copy of the contents of a web page at a particular
URL. In this example, the URL is the tag, and the content of the web page is the data. The percentage of accesses that result in cache hits is known as the
hit rate or
hit ratio of the cache. The alternative situation, when the cache is checked and found not to contain any entry with the desired tag, is known as a
cache miss. This requires a more expensive access of data from the backing store. Once the requested data is retrieved, it is typically copied into the cache, ready for the next access. During a cache miss, some other previously existing cache entry is typically removed in order to make room for the newly retrieved data. The
heuristic used to select the entry to replace is known as the
replacement policy. One popular replacement policy, least recently used (LRU), replaces the oldest entry, the entry that was accessed less recently than any other entry. More sophisticated caching algorithms also take into account the frequency of use of entries.
Write policies Cache writes must eventually be propagated to the backing store. The timing for this is governed by the
write policy. The two primary write policies are: •
Write-through: Writes are performed synchronously to both the cache and the backing store. •
Write-back: Initially, writing is done only to the cache. The write to the backing store is postponed until the modified content is about to be replaced by another cache block. A write-back cache is more complex to implement since it needs to track which of its locations have been written over and mark them as
dirty for later writing to the backing store. The data in these locations are written back to the backing store only when they are evicted from the cache, a process referred to as a
lazy write. For this reason, a read miss in a write-back cache may require two memory accesses to the backing store: one to write back the dirty data, and one to retrieve the requested data. Other policies may also trigger data write-back. The client may make many changes to data in the cache, and then explicitly notify the cache to write back the data. Write operations do not return data. Consequently, a decision needs to be made for write misses: whether or not to load the data into the cache. This is determined by these
write-miss policies: •
Write allocate (also called
fetch on write): Data at the missed-write location is loaded to cache, followed by a write-hit operation. In this approach, write misses are similar to read misses. •
No-write allocate (also called
write-no-allocate or
write around): Data at the missed-write location is not loaded to cache, and is written directly to the backing store. In this approach, data is loaded into the cache on read misses only. While both write policies can Implement either write-miss policy, they are typically paired as follows: • A write-back cache typically employs write allocate, anticipating that subsequent writes or reads to the same location will benefit from having the data already in the cache. • A write-through cache uses no-write allocate. Here, subsequent writes have no advantage, since they still need to be written directly to the backing store. Entities other than the cache may change the data in the backing store, in which case the copy in the cache may become out-of-date or
stale. Alternatively, when the client updates the data in the cache, copies of that data in other caches will become stale. Communication protocols between the cache managers that keep the data consistent are associated with
cache coherence.
Prefetch On a cache read miss, caches with a
demand paging policy read the minimum amount from the backing store. A typical demand-paging virtual memory implementation reads one page of virtual memory (often 4 KB) from disk into the disk cache in RAM. A typical CPU reads a single L2 cache line of 128 bytes from DRAM into the L2 cache, and a single L1 cache line of 64 bytes from the L2 cache into the L1 cache. Caches with a
prefetch input queue or more general
anticipatory paging policy go further—they not only read the data requested, but guess that the next chunk or two of data will soon be required, and so prefetch that data into the cache ahead of time. Anticipatory paging is especially helpful when the backing store has a long latency to read the first chunk and much shorter times to sequentially read the next few chunks, such as
disk storage and DRAM. A few operating systems go further with a
loader that always pre-loads the entire executable into RAM. A few caches go even further, not only pre-loading an entire file, but also starting to load other related files that may soon be requested, such as the
page cache associated with a
prefetcher or the
web cache associated with
link prefetching. ==Examples of hardware caches==