Stream buffers • Stream buffers were developed based on the concept of "one block lookahead (OBL) scheme" proposed by
Alan Jay Smith. and many variations of this method have been developed since. The basic idea is that the
cache miss address (and
k subsequent addresses) are fetched into a separate buffer of depth
k. This buffer is called a stream buffer and is separate from the cache. The processor then consumes data/instructions from the stream buffer if the address associated with the prefetched blocks matches the requested address generated by the program executing on the processor. The figure below illustrates this setup: File:CachePrefetching_StreamBuffers.png|center|A typical stream buffer setup as originally proposed by Norman Jouppi in 1990 For each new miss, there would be a new stream buffer allocated, and it would operate in a similar way as described above. • The ideal depth of the stream buffer is subject to experimentation against various benchmarks
Strided prefetching This type of prefetching monitors the delta between the addresses of the memory accesses and looks for patterns within it.
Regular strides In this pattern, consecutive memory accesses are made to blocks that are
s addresses apart. In this case, the prefetcher calculates the
s and uses it to compute the memory address for prefetching. For example, if , the address to be prefetched would
A+4.
Irregular spatial strides In this case, the delta between the addresses of consecutive memory accesses is variable but still follows a pattern. Some prefetcher designs exploit this property to predict and prefetch for future accesses.
Irregular temporal prefetching This class of prefetchers looks for memory access streams that repeat over time. For example, in the stream of memory accesses N, A, B, C, E, G, H, A, B, C, I, J, K, A, B, C, L, M, N, O, A, B, C, ...; the stream A, B, C is repeating over time. Other design variations have tried to provide more efficient implementations.
Collaborative prefetching Computer applications generate a variety of access patterns. The processor and memory subsystem architectures used to execute these applications further disambiguate the memory access patterns they generate. Hence, the effectiveness and efficiency of prefetching schemes often depends on the application and the architectures used to execute them. Recent research has focused on building collaborative mechanisms to synergistically use multiple prefetching schemes for better prefetching coverage and accuracy. == Methods of software prefetching ==