Content deleted Content added
Guy Harris (talk | contribs) →Methods of hardware prefetching: Sentence case in section headings. |
Fix cite date error |
||
Line 21:
[[File:CachePrefetching_StreamBuffers.png|center|<ref name=":1"/> A typical stream buffer setup as originally proposed by Norman Jouppi in 1990|alt=A typical stream buffer setup as originally proposed|thumb|400x400px]]
* Whenever the prefetch mechanism detects a miss on a memory block, say A, it allocates a stream to begin prefetching successive blocks from the missed block onward. If the stream buffer can hold 4 blocks, then we would prefetch A+1, A+2, A+3, A+4 and hold those in the allocated stream buffer. If the processor consumes A+1 next, then it shall be moved "up" from the stream buffer to the processor's cache. The first entry of the stream buffer would now be A+2 and so on. This pattern of prefetching successive blocks is called '''Sequential Prefetching'''. It is mainly used when contiguous locations are to be prefetched. For example, it is used when prefetching instructions.
* This mechanism can be scaled up by adding multiple such 'stream buffers' - each of which would maintain a separate prefetch stream.<ref>{{Cite journal |last=Ishii |first=Yasuo |last2=Inaba |first2=Mary |last3=Hiraki |first3=Kei |date=2009-06-08 |title=Access map pattern matching for data cache prefetch |url=https://doi.org/10.1145/1542275.1542349 |journal=Proceedings of the 23rd international conference on Supercomputing |series=ICS '09 |___location=New York, NY, USA |publisher=Association for Computing Machinery |pages=499–500 |doi=10.1145/1542275.1542349 |isbn=978-1-60558-498-0}}</ref>
* The ideal depth of the stream buffer is something that is subject to experimentation against various benchmarks<ref name=":1" /> and depends on the rest of the [[microarchitecture]] involved.<ref>{{Cite journal |last=Srinath |first=Santhosh |last2=Mutlu |first2=Onur |last3=Kim |first3=Hyesoon |last4=Patt |first4=Yale N. |date=February 2007
=== Strided prefetching ===
This type of prefetching monitors the delta between the addresses of the memory accesses and looks for patterns within it.
==== Regular strides ====
In this pattern, consecutive memory accesses are made to blocks that are <math>s</math> addresses apart.<ref name=":2" /><ref>{{Cite journal |last=Kondguli |first=Sushant |last2=Huang |first2=Michael |date=November 2017 |title=T2: A Highly Accurate and Energy Efficient Stride Prefetcher |url=https://ieeexplore.ieee.org/document/8119237 |journal=2017 IEEE International Conference on Computer Design (ICCD) |pages=373–376 |doi=10.1109/ICCD.2017.64}}</ref> In this case, the prefetcher calculates the <math>s</math> and uses it to compute the memory address for prefetching. Eg: If the <math>s</math> is 4, the address to be prefetched would A+4.
==== Irregular strides ====
In this case, the delta between the addresses of consecutive memory accesses is variable but still follows a pattern. Some prefetchers designs<ref>{{Citation |last=Grannaes |first=Marius |title=Storage Efficient Hardware Prefetching using Delta-Correlating Prediction Tables |url=https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.229.3483 |access-date=2022-03-16 |last2=Jahre |first2=Magnus |last3=Natvig |first3=Lasse}}</ref><ref>{{Cite journal |last=Shevgoor |first=Manjunath |last2=Koladiya |first2=Sahil |last3=Balasubramonian |first3=Rajeev |last4=Wilkerson |first4=Chris |last5=Pugsley |first5=Seth H |last6=Chishti |first6=Zeshan |date=December 2015
=== Temporal prefetching ===
This class of prefetchers look for memory access streams that repeat over time.<ref>{{Cite journal |last=Joseph |first=Doug |last2=Grunwald |first2=Dirk |date=1997-05-01 |title=Prefetching using Markov predictors |url=https://doi.org/10.1145/264107.264207 |journal=Proceedings of the 24th annual international symposium on Computer architecture |series=ISCA '97 |___location=New York, NY, USA |publisher=Association for Computing Machinery |pages=252–263 |doi=10.1145/264107.264207 |isbn=978-0-89791-901-2}}</ref><ref>{{Cite journal |last=Collins |first=J. |last2=Sair |first2=S. |last3=Calder |first3=B. |last4=Tullsen |first4=D.M. |date=November 2002
=== Collaborative prefetching ===
Computer applications generate a variety of access patterns. The processor and memory subsystem architectures used to execute these applications further disambiguate the memory access patterns they generate. Hence, the effectiveness and efficiency of prefetching schemes often depend the application and the architectures used to execute them.<ref>{{Cite journal |last=Kim |first=Jinchun |last2=Teran |first2=Elvira |last3=Gratz |first3=Paul V. |last4=Jiménez |first4=Daniel A. |last5=Pugsley |first5=Seth H. |last6=Wilkerson |first6=Chris |date=2017-05-12 |title=Kill the Program Counter: Reconstructing Program Behavior in the Processor Cache Hierarchy |url=https://dl.acm.org/doi/10.1145/3093336.3037701 |journal=ACM SIGPLAN Notices |language=en |volume=52 |issue=4 |pages=737–749 |doi=10.1145/3093336.3037701 |issn=0362-1340}}</ref>
== Methods of software prefetching ==
|