Content deleted Content added
Added a citation |
LucasBrown (talk | contribs) No edit summary |
||
(25 intermediate revisions by 14 users not shown) | |||
Line 1:
{{short description|Computer processing technique to boost memory performance}}
'''Cache prefetching''' is a technique used by computer processors to boost execution performance by fetching instructions or data from their original storage in slower memory to a faster local memory before it is actually needed (hence the term 'prefetch').<ref name=":3">{{Cite journal|last=Smith|first=Alan Jay|date=1982-09-01|title=Cache Memories|journal=ACM Comput. Surv.|volume=14|issue=3|pages=473–530|doi=10.1145/356887.356892|s2cid=6023466 |issn=0360-0300}}</ref><ref>{{Cite journal |
== Data vs. instruction cache prefetching ==
Line 6:
* '''Data prefetching''' fetches data before it is needed. Because data access patterns show less regularity than instruction patterns, accurate data prefetching is generally more challenging than instruction prefetching.
* '''Instruction prefetching''' fetches instructions before they need to be executed. The first mainstream microprocessors to use some form of instruction prefetch were the [[Intel]] [[8086]] (six bytes) and the [[Motorola]] [[68000]] (four bytes). In recent years,{{When|date=September 2025}} all high-performance processors use prefetching techniques.
== Hardware vs. software cache prefetching ==
Cache prefetching can be accomplished either by hardware or by software.<ref name=":2" />
* '''Hardware
* '''Software
== Methods of hardware prefetching ==
Line 18:
=== Stream buffers ===
* Stream buffers were developed based on the concept of "one block lookahead (OBL) scheme" proposed by [[Alan Jay Smith]].<ref name=":3" />
* Stream [[Data buffer|buffers]] are one of the most common hardware based prefetching techniques in use. This technique was originally proposed by [[Norman Jouppi]] in 1990,<ref name=":1">{{cite conference |
[[File:CachePrefetching_StreamBuffers.png|center|
* Whenever the prefetch mechanism detects a miss on a memory block, say ''A'', it allocates a stream to begin prefetching successive blocks from the missed block onward. If the stream buffer can hold 4 blocks, then
* This mechanism can be scaled up by adding multiple such
* The ideal depth of the stream buffer
=== Strided prefetching ===
Line 28:
==== Regular strides ====
In this pattern, consecutive memory accesses are made to blocks that are
==== Irregular spatial strides ====
In this case, the delta between the addresses of consecutive memory accesses is variable but still follows a pattern. Some
====
This class of prefetchers
=== Collaborative prefetching ===
Computer applications generate a variety of access patterns. The processor and memory subsystem architectures used to execute these applications further disambiguate the memory access patterns they generate. Hence, the effectiveness and efficiency of prefetching schemes often
== Methods of software prefetching ==
=== Compiler
Compiler
These prefetches are non-blocking memory operations
One main advantage of software prefetching is that it reduces the number of compulsory cache misses.<ref name=":2" />
Line 50:
The following example shows how a prefetch instruction would be added into code to improve [[Cache performance measurement and metric|cache performance]].
Consider a for
for (int i=0; i<1024; i++) {
array1[i] = 2 * array1[i];
}
</syntaxhighlight>At each iteration, the ''i''<sup>th</sup> element of the array
for (int i=0; i<1024; i++) {
prefetch (array1 [i + k]);
array1[i] = 2 * array1[i];
}
</syntaxhighlight>Here, the prefetch stride, <
== Comparison of hardware and software prefetching ==
* While software prefetching requires programmer or [[compiler]] intervention, hardware prefetching requires special hardware mechanisms.<ref name=":2" />
* Software prefetching works well only with loops where there is regular array access, as the programmer has to hand
* Hardware prefetching also has less CPU overhead when compared to software prefetching.<ref>{{Cite conference |last1=Callahan |first1=David |last2=Kennedy |first2=Ken |last3=Porterfield |first3=Allan |date=1991-01-01 |title=Software Prefetching |conference=Fourth International Conference on Architectural Support for Programming Languages and Operating Systems |___location=Santa Clara,
| date=2012| volume=9| pages=1–29}}</ref>
== Metrics of cache prefetching ==
There are three main metrics to judge cache prefetching<ref name=":2">{{Cite book |last=Solihin |first=Yan |title=Fundamentals of parallel multicore architecture
=== Coverage ===
Line 74 ⟶ 75:
<math>Coverage = \frac{\text{Cache Misses eliminated by Prefetching}}{\text{Total Cache Misses}}</math>,
where
=== Accuracy ===
Accuracy is the fraction of total prefetches that were useful
<math>\text{Prefetch Accuracy} = \frac{\text{Cache Misses eliminated by prefetching}}{(\text{Useless Cache Prefetches}) + (\text{Cache Misses eliminated by prefetching})}</math>
While it appears that having perfect accuracy might imply that there are no misses, this is not the case. The prefetches themselves might result in new misses if the prefetched blocks are placed directly into the cache. Although these may be a small fraction of the total number of misses
=== Timeliness ===
The qualitative definition of timeliness is how early a block is prefetched versus when it is actually referenced. An example to further explain timeliness is as follows
Consider a for loop where each iteration takes 3 cycles to execute and the
== See also ==
|