Cache hierarchy: Difference between revisions

Content deleted Content added
m Replace magic links with templates per local RfC and MediaWiki RfC
Line 41:
 
=== Disadvantages ===
* Cache memory comes at an increased [[marginal cost]] than main memory and thus can increase the cost of the overall system.<ref>Vojin G. Oklobdzija, 2017. Digital Design and Fabrication. CRC Press. p. 4. {{ISBN |978-0-8493-8604-6}}.</ref>
* Cached data is stored only so long as power is provided to the cache.
* Increased on-chip area required for memory system.<ref>{{Cite web|url=https://www.bottomupcs.com/memory.xhtml|title=Memory Hierarchy|last=|first=|date=|website=|publisher=|access-date=}}</ref>
Line 50:
 
=== Banked versus unified ===
In a banked cache, the cache is divided into a cache dedicated to [[Machine code|instruction]] storage and a cache dedicated to data. In contrast, a unified cache contains both the instructions and data in the same cache.<ref>Yan Solihin, 2015. Fundamentals of Parallel Multicore Architecture. CRC Press. p. 150. {{ISBN |978-1-4822-1119-1}}.</ref> During a process, the L1 cache (or most upper-level cache in relation to its connection to the processor) is accessed by the processor to retrieve both instructions and data. Requiring both actions to be implemented at the same time requires multiple ports and more access time in a unified cache. Having multiple ports requires additional hardware and wiring, leading to a significant structure between the caches and processing units.<ref>Steve Heath, 2002. Embedded Systems Design. Elsevier. p. 106. {{ISBN |978-0-08-047756-5}}.</ref> To avoid this, the L1 cache is often organized as a banked cache which results in fewer ports, less hardware, and generally lower access times.<ref name=":1" />
 
Modern processors have split caches, and in systems with multilevel caches higher level caches may be unified while lower levels split.<ref>Alan Clements, 2013. Computer Organization & Architecture: Themes and Variations. Cengage Learning. p. 588. {{ISBN |1-285-41542-6}}.</ref>
 
=== Inclusion policies ===
Line 67:
There are two policies which define the way in which a modified cache block will be updated in the main memory: write through and write back.<ref name=":0" />
 
In the case of write through policy, whenever the value of the cache block changes, it is further modified in the lower-level memory hierarchy as well.<ref>David A. Patterson; John L. Hennessy; 2017. Computer Organization and Design RISC-V Edition: The Hardware Software Interface. Elsevier Science. pp. 386–387. {{ISBN |978-0-12-812276-1}}.</ref> This policy ensures that the data is stored safely as it is written throughout the hierarchy.
 
However, in the case of the write back policy, the changed cache block will be updated in the lower-level hierarchy only when the cache block is evicted. A "dirty bit" is attached to each cache block and set whenever the cache block is modified.<ref>Stefan Goedecker; Adolfy Hoisie; 2001. Performance Optimization of Numerically Intensive Codes. SIAM. p. 11. {{ISBN |978-0-89871-484-5}}.</ref> During eviction, blocks with a set dirty bit will be written to the lower-level hierarchy. Under this policy, there is a risk for data-loss as the most recently changed copy of a datum is only stored in the cache and therefore some corrective techniques must be observed.
 
In case of a write where the byte is not present in the cache block, the byte may be brought to the cache as determined by a write allocate or write no-allocate policy.<ref name=":0" /> Write allocate policy states that in case of a write miss, the block is fetched from the main memory and placed in the cache before writing.<ref>Harvey G. Cragon, 1996. Memory Systems and Pipelined Processors. Jones & Bartlett Learning. p. 47. {{ISBN |978-0-86720-474-2}}.</ref> In the write no-allocate policy, if the block is missed in the cache it will write in the lower-level memory hierarchy without fetching the block into the cache.<ref>David A. Patterson; John L. Hennessy; 2007. Computer Organization and Design, Revised Printing, Third Edition: The Hardware/Software Interface. Elsevier. p. 484. {{ISBN |978-0-08-055033-6}}.</ref>
 
The common combinations of the policies are [[Cache (computing)#Writing policies|"write block", "write allocate", and "write through write no-allocate"]].
Line 79:
A private cache is assigned to one particular core in a processor, and cannot be accessed by any other cores. In some architectures, each core has its own private cache; this creates the risk of duplicate blocks in a system's cache architecture, which results in reduced capacity utilization. However, this type of design choice in a multi-layer cache architecture can also lend itself to a lower data-access latency.<ref name=":0" /><ref>{{Cite web|url=https://software.intel.com/en-us/articles/software-techniques-for-shared-cache-multi-core-systems|title=Software Techniques for Shared-Cache Multi-Core Systems|last=|first=|date=|website=|publisher=|access-date=}}</ref><ref>{{Cite web|url=http://hpcaconf.org/hpca13/papers/002-dybdahl.pdf|title=An Adaptive Shared/Private NUCA Cache Partitioning Scheme for Chip Multiprocessors|last=|first=|date=|website=|publisher=|access-date=}}</ref>
 
A shared cache is a cache which can be accessed by multiple cores.<ref>Akanksha Jain; Calvin Lin; 2019. Cache Replacement Policies. Morgan & Claypool Publishers. p. 45. {{ISBN |978-1-68173-577-1}}.</ref> Since it is shared, each block in the cache is unique and therefore has a larger hit rate as there will be no duplicate blocks. However, data-access latency can increase as multiple cores try to access the same cache.<ref>David Culler; Jaswinder Pal Singh; Anoop Gupta; 1999. Parallel Computer Architecture: A Hardware/Software Approach. Gulf Professional Publishing. p. 436. {{ISBN |978-1-55860-343-1}}.</ref>
 
In [[multi-core processor]]s, the design choice to make a cache shared or private impacts the performance of the processor.<ref name="Keckler (2009)">Stephen W. Keckler; Kunle Olukotun; H. Peter Hofstee; 2009. Multicore Processors and Systems. Springer Science & Business Media. p. 182. {{ISBN |978-1-4419-0263-4}}.</ref> In practice, the upper-level cache L1 (or sometimes L2)<ref name=":2" /><ref name=":3" /> is implemented as private and lower-level caches are implemented as shared. This design provides high access rates for the high-level caches and low miss rates for the lower-level caches.<ref name="Keckler (2009)" />
 
== Recent implementation models ==