In general, a
program can be made more cache-conscious: •
Temporal locality, where the algorithm fetches the same pieces of memory multiple times; •
Spatial locality, where the subsequent memory accesses are adjacent or nearby
memory addresses. Cache-oblivious algorithms are typically analyzed using an idealized model of the cache, sometimes called the
cache-oblivious model. This model is much easier to analyze than a real cache's characteristics (which have complex associativity, replacement policies, etc.), but in many cases is within a constant factor of a more realistic cache's performance. It is different than the
external memory model because cache-oblivious algorithms do not know the
block size or the
cache size. In particular, the cache-oblivious model is an
abstract machine (i.e., a theoretical
model of computation). It is similar to the
RAM machine model which replaces the
Turing machine's infinite tape with an infinite array. Each location within the array can be accessed in O(1) time, similar to the
random-access memory on a real computer. Unlike the RAM machine model, it also introduces a cache: the second level of storage between the RAM and the CPU. The other differences between the two models are listed below. In the cache-oblivious model: • Memory is broken into blocks of B objects each. • A load or a store between
main memory and a CPU register may now be serviced from the cache. • If a load or a store cannot be serviced from the cache, it is called a
cache miss. • A cache miss results in one block being loaded from the main memory into the cache. Namely, if the CPU tries to access word w and x is the line containing w, then x is loaded into the cache. If the cache was previously full, then a line will be evicted as well (see replacement policy below). • The cache holds M objects, where M = \Omega(B^2). This is also known as the
tall cache assumption. • The cache is fully associative: each line can be loaded into any location in the cache. • The replacement policy is optimal. In other words, the cache is assumed to be given the entire sequence of memory accesses during algorithm execution. If it needs to evict a line at time t, it will look into its sequence of future requests and evict the line whose first access is furthest in the future. This can be emulated in practice with the
Least Recently Used policy, which is shown to be within a small constant factor of the offline optimal replacement strategy To measure the complexity of an algorithm that executes within the cache-oblivious model, we measure the number of
cache misses that the algorithm experiences. Because the model captures the fact that accessing elements in the
cache is much faster than accessing things in
main memory, the
running time of the algorithm is defined only by the number of memory transfers between the cache and main memory. This is similar to the
external memory model, which all of the features above, but cache-oblivious algorithms are independent of cache parameters (B and M). The benefit of such an algorithm is that what is efficient on a cache-oblivious machine is likely to be efficient across many real machines without fine-tuning for particular real machine parameters. For many problems, an optimal cache-oblivious algorithm will also be optimal for a machine with more than two
memory hierarchy levels. ==Examples==