Cache prefetching: Difference between revisions

Content deleted Content added
Line 22:
* Whenever the prefetch mechanism detects a miss on a memory block, say A, it allocates a stream to begin prefetching successive blocks from the missed block onward. If the stream buffer can hold 4 blocks, then we would prefetch A+1, A+2, A+3, A+4 and hold those in the allocated stream buffer. If the processor consumes A+1 next, then it shall be moved "up" from the stream buffer to the processor's cache. The first entry of the stream buffer would now be A+2 and so on. This pattern of prefetching successive blocks is called '''Sequential Prefetching'''. It is mainly used when contiguous locations are to be prefetched. For example, it is used when prefetching instructions.
* This mechanism can be scaled up by adding multiple such 'stream buffers' - each of which would maintain a separate prefetch stream.<ref>{{Cite conference |last1=Ishii |first1=Yasuo |last2=Inaba |first2=Mary |last3=Hiraki |first3=Kei |date=2009-06-08 |title=Access map pattern matching for data cache prefetch |url=https://doi.org/10.1145/1542275.1542349 |conference=ICS 2009 |___location=New York, New York, USA |publisher=Association for Computing Machinery |pages=499–500 |doi=10.1145/1542275.1542349 |isbn=978-1-60558-498-0 |book-title=Proceedings of the 23rd International Conference on Supercomputing |s2cid=37841036}}</ref> For each new miss, there would be a new stream buffer allocated and it would operate in a similar way as described above.
* The ideal depth of the stream buffer is something that is subject to experimentation against various benchmarks<ref name=":1" /> and depends on the rest of the [[microarchitecture]] involved.<ref>{{Cite conference |last1=Srinath |first1=Santhosh |last2=Mutlu |first2=Onur |last3=Kim |first3=Hyesoon |author3-link=Hyesoon Kim|last4=Patt |first4=Yale N.|author4-link=Yale Patt |date=February 2007 |title=Feedback Directed Prefetching: Improving the Performance and Bandwidth-Efficiency of Hardware Prefetchers |url=https://ieeexplore.ieee.org/document/4147648 |conference=2007 IEEE 13th International Symposium on High Performance Computer Architecture |pages=63–74 |doi=10.1109/HPCA.2007.346185|isbn=978-1-4244-0804-7 |s2cid=6909725 }}</ref>
 
=== Strided prefetching ===