Cache placement policies: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 01:37, 22 December 2020 edit Monkbot (talk \| contribs) Bots 3,695,952 edits m Task 18 (cosmetic): eval 9 templates: del empty params (15×); Tag: AWB ← Previous edit		Latest revision as of 04:04, 21 August 2025 edit undo InternetArchiveBot (talk \| contribs) Bots, Pending changes reviewers 5,696,207 edits Rescuing 1 sources and tagging 0 as dead.) #IABot (v2.0.9.5
(46 intermediate revisions by 31 users not shown)
Line 1: {{Short description\|Design decisions affecting processor cache speeds and sizes}} {{~~distinguish~~Distinguish\|~~Cache~~cache replacement policies}} A'''Cache ~~[[CPU~~placement ~~cache]]~~policies''' isare policies that determine where a particular memory ~~which~~block ~~holds~~can ~~the~~be ~~recently~~placed ~~utilized~~when ~~data~~it bygoes ~~the~~into ~~processor~~a [[CPU cache]]. A block of memory cannot necessarily be placed ~~randomly~~at an arbitrary ___location in the cache; ~~and~~it may be restricted to a ~~single~~particular [[CPU cache#Cache entries\|cache line]] or a set of cache lines<ref name=":0">{{Cite web\|url=https://cseweb.ucsd.edu/classes/su07/cse141/cache-handout.pdf\|title=The Basics of Cache}}</ref> by the ~~'''~~cache's placement policy~~'''~~.<ref>{{Cite web \|title=Cache Placement Policies \|url=http://web.cs.iastate.edu/~prabhu/Tutorial/CACHE/bl_place.html \|~~title~~archive-url=~~Cache~~https://web.archive.org/web/20200221213947/http://web.cs.iastate.edu/~prabhu/Tutorial/CACHE/bl_place.html ~~Placement~~\|archive-date=Feb ~~Policies~~21, 2020 \|url-status=dead}}</ref><ref>{{Cite web\|url=http://fourier.eng.hmc.edu/e85_old/lectures/memory/node4.html\|title=Placement Policies\|archive-url=https://web.archive.org/web/20200814000302/http://fourier.eng.hmc.edu/e85_old/lectures/memory/node4.html\|archive-date=August 14, 2020\|url-status=dead}}</ref> ~~In other words, the cache placement policy determines where a particular memory block can be placed when it goes into the cache.~~ There are three different policies available for placement of a memory block in the cache: direct-mapped, fully associative, and set-associative. Originally this space of cache organizations was described using the term "congruence mapping".<ref>{{Cite journal\|last=Mattson\|first=R.L.\|author1-link=Richard Mattson\|last2=Gecsei\|first2=J.\|last3=Slutz\|first3=D. R.\|last4=Traiger\|first4=I\|date=1970\|title= Evaluation Techniques for Storage Hierarchies\|journal=IBM Systems Journal\|volume=9\|issue=2\|pages=78–117\|doi=10.1147/sj.92.0078}}</ref> == Direct-mapped cache == In a direct-mapped cache structure, the cache is organized into multiple sets<ref name=":0" /> with a single cache line per set. Based on the address of the memory block, it can only occupy a single cache line. The cache can be framed as a ({{math\| ''n'' × 1)}} column matrix.<ref name=":1">{{Cite book\|title=Fundamentals of Parallel Multi-core Architecture\|last=Solihin\|first=Yan\|publisher=Taylor & Francis\|year=2015\|isbn=978-1482211184\|pages=136–141}}</ref> === To place a block in the cache === The set is determined by the [[CPU cache#Cache entry structure\|index]]<ref name=":0" /> bits derived from the address of the memory block. * The memory block is placed in the set identified and the [[CPU cache#Cache entry structure\|tag]] <ref name=":0" /> is stored in the tag field associated with the set. * If the cache line is previously occupied, then the new data replaces the memory block in the cache. === To search a word in the cache === * The set is identified by the index bits of the address. * The tag bits derived from the memory block address are compared with the tag bits associated with the set. If the tag matches, then there is a [[CPU cache#Cache entries\|cache hit]] and the cache block is returned to the processor. Else there is a [[CPU cache#Cache miss\|cache miss]] and the memory block is fetched from the lower memory ([[Computer data storage#Primary storage\|main memory]], [[Computer data storage#Secondary storage\|disk]]). === Advantages === * This placement policy is power efficient as it avoids the search through all the cache lines. * The placement policy and the [[CPU cache#Replacement policies\|replacement policy]] is simple. * ItSimple ~~requires~~and ~~cheap~~low-cost hardware can be used, as only one tag needs to be checked at a time. === Disadvantage === * It has lower cache hit rate, as there is only one cache line available in a set. Every time a new memory is referenced to the same set, the cache line is replaced, which causes conflict miss.<ref>{{Cite web\|url=http://meseec.ce.rit.edu/eecc551-winter2001/551-1-30-2002.pdf\|title=Cache Miss Types\|access-date=2016-10-24\|archive-date=2016-11-30\|archive-url=https://web.archive.org/web/20161130184519/http://meseec.ce.rit.edu/eecc551-winter2001/551-1-30-2002.pdf\|url-status=dead}}</ref> === Example === [[File:Direct-Mapped Cache Snehal Img.png\|thumb\|500x500px\|Direct-Mapped Cache]] Consider a main memory of 16 kilobytes, which is organized as 4-byte blocks, and a direct-mapped cache of 256 bytes with a block size of 4 bytes. Because the main memory is 16kB, we need a minimum of 14 bits to uniquely represent a memory address. Since each cache block is of size 4 bytes, the total number of sets in the cache is 256/4, which equals 64 sets. Line 33: The incoming address to the cache is divided into bits for [[CPU cache#Cache entry structure\|Offset]], [[CPU cache#Cache entry structure\|Index]] and [[CPU cache#Cache entry structure\|Tag]]. * ''Offset'' corresponds to the bits used to determine the byte to be accessed from the cache line. Because the cache lines are 4 bytes long, there are ''2 offset bits''. * ''Index'' corresponds to bits used to determine the set of the Cache. There are 64 sets in the cache, and because 2^6 = 64, there are ''6 index bits.''▼ In* ''Tag'' corresponds to the ~~example,~~remaining bits. This means there are 14 – (6+2) = ''6 tag bits'', which are stored in tag field to match the address on cache request.▼ Below are memory addresses and an explanation of which cache line they map to: ~~In the example, there are 2 offset bits, which are used to address the 4 bytes of the cache line.~~ # Address <code>0x0000</code> (tag - ~~00_0000~~<code>0b00_0000</code>, index – ~~00_0000~~<code>0b00_0000</code>, offset – 00<code>0b00</code>) ~~maps~~corresponds to block 0 of the memory and ~~occupies~~maps to the set 0 of the cache.▼ ▲Index corresponds to bits used to determine the set of the Cache. # Address <code>0x0004</code> (tag - ~~00_0000~~<code>0b00_0000</code>, index – ~~00_0001~~<code>0b00_0001</code>, offset – 00<code>0b00</code>) ~~maps~~corresponds to block 1 of the memory and ~~occupies~~maps to the set 1 of the cache.▼ # Address ~~0x0100~~<code>0x00FF</code> (tag – ~~00_0001~~<code>0b00_0000</code>, index – ~~00_0000~~<code>0b11_1111</code>, offset – 00<code>0b11</code>) ~~maps~~corresponds to block 6463 of the memory and ~~occupies~~maps to the set 063 of the cache.▼ ~~In the example, there are 6 index bits, which are used to address the 64 sets of the cache.~~ ~~Similarly,~~# Address ~~address~~<code>0x0100</code> ~~0x00FF~~(tag – ~~00_0000~~<code>0b00_0001</code>, index – ~~11_1111~~<code>0b00_0000</code>, offset – 11<code>0b00</code>) ~~maps~~corresponds to block 6364 of the memory and ~~occupies~~maps to the set 630 of the cache.▼ ~~Tag corresponds to the remaining bits.~~ ▲In the example, there are 14 – (6+2) = 6 tag bits, which are stored in tag field to match the address on cache request. ▲Address 0x0000(tag - 00_0000, index – 00_0000, offset – 00) maps to block 0 of the memory and occupies the set 0 of the cache. ▲Address 0x0004(tag - 00_0000, index – 00_0001, offset – 00) maps to block 1 of the memory and occupies the set 1 of the cache. ▲Similarly, address 0x00FF(tag – 00_0000, index – 11_1111, offset – 11) maps to block 63 of the memory and occupies the set 63 of the cache. ▲Address 0x0100(tag – 00_0001, index – 00_0000, offset – 00) maps to block 64 of the memory and occupies the set 0 of the cache. == Fully associative cache == In a fully associative cache, the cache is organized into a single cache set with multiple cache lines. A memory block can occupy any of the cache lines. The cache organization can be framed as ({{math\|1* × ''m)''}} row matrix.<ref name=":1" /> === To place a block in the cache === * The cache line is selected based on the [[CPU cache#Flag bits\|valid bit]]<ref name=":0" /> associated with it. If the valid bit is 0, the new memory block can be placed in the cache line, else it has to be placed in another cache line with valid bit 0. * If the cache is completely occupied then a block is evicted and the memory block is placed in that cache line. * The eviction of memory block from the cache is decided by the [[CPU cache#Replacement policies\|replacement policy]].<ref>{{Cite web\|url=~~https~~http://www.cs.umd.edu/class/sum2003/cmsc311/Notes/Memory/fully.html\|title=Fully Associative Cache\|archive-url=https://web.archive.org/web/20171224054857/http://www.cs.umd.edu/class/sum2003/cmsc311/Notes/Memory/fully.html\|archive-date=December 24, 2017\|url-status=dead}}</ref> === To search a word in the cache === * The Tag field of the memory address is compared with tag bits associated with all the cache lines. If it matches, the block is present in the cache and is a cache hit. If it ~~doesn't~~does not match, then it's is a cache miss and has to be fetched from the lower memory. * Based on the Offset, a byte is selected and returned to the processor. [[File:Fully-Associative Cache Snehal Img.png\|thumb\|513x513px\|Fully associative cache]]▼ === Advantages === * Fully associative cache structure provides us the flexibility of placing memory block in any of the cache lines and hence full utilization of the cache. * The placement policy provides better cache hit rate. * It offers the flexibility of utilizing a wide variety of [[CPU cache#Replacement policies\|replacement algorithms]] if a cache miss occurs. === ~~Disadvantage~~Disadvantages === * The placement policy is ~~slow~~power hungry as itthe ~~takes~~comparison ~~time~~circuitry has to ~~iterate~~run ~~through all~~over the ~~lines.~~entire cache to locate a block. * The placement policy is power hungry as it has to iterate over entire cache set to locate a block. * The most expensive of all methods, due to the high cost of associative-comparison hardware. === Example === Consider a main memory of 16 kilobytes, which is organized as 4-byte blocks, and a fully associative cache of 256 bytes and a block size of 4 bytes. Because the main memory is 16kB, we need a minimum of 14 bits to uniquely represent a memory address.▼ ▲[[File:Fully-Associative Cache Snehal Img.png\|thumb\|513x513px\|Fully associative cache]] ▲Consider a main memory of 16 kilobytes, which is organized as 4-byte blocks, and a fully associative cache of 256 bytes and a block size of 4 bytes. The total number of sets in the cache is 1, and the set contains 256/4=64 cache lines, as the cache block is of size 4 bytes. ~~Since each cache block is of size 4 bytes, the total number of sets in the cache is 256/4, which equals 64 sets or cache lines.~~ The incoming address to the cache is divided into bits for offset and tag. * ''Offset'' corresponds to the bits used to determine the byte to be accessed from the cache line. In the example, there are 2 offset bits, which are used to address the 4 bytes of the cache line * ''Tag'' corresponds to the remaining bits. This means there are 14 – (2) = ''12 tag bits'', which are stored in tag field to match the address on cache request. ~~In the example, there are 2 offset bits, which are used to address the 4 bytes of the cache line and the remaining 12 bits form the tag.~~ ~~The tag bits are stored in the tag field of the cache line to match the address on cache request.~~ Since any block of memory can be mapped to any cache line, the memory block can occupy one of the cache lines based on the replacement policy. Line 94 ⟶ 81: Set-associative cache is a trade-off between direct-mapped cache and fully associative cache. A set-associative cache can be imagined as a ({{math\|''n'' × ''m)''}} matrix. The cache is divided into ‘n’ sets and each set contains ‘m’ cache lines. A memory block is first mapped onto a set and then placed into any cache line of the set. The range of caches from direct-mapped to fully associative is a continuum of levels of set associativity. (A direct-mapped cache is one-way set-associative and a fully associative cache with ''m'' cache lines is ''m''-way set-associative.) Line 106 ⟶ 93: === To locate a word in the cache === The set is determined by the index bits derived from the address of the memory block. * The tag bits are compared with the tags of all cache lines present in selected set. If the tag matches any of the cache lines, it is a cache hit and the appropriate line is returned. If the tag ~~doesn't~~does not match any of the lines, then it is a cache miss and the data is requested from next level in the memory hierarchy. === Advantages === Line 113 ⟶ 100: === Disadvantages === * The placement policy will not effectively use all the available cache lines in the cache and suffers from [[~~CPU~~Cache ~~cache~~performance measurement and metric#~~Cache~~Conflict ~~miss~~misses\|conflict miss]]. === Example === Consider a main memory of 16 kilobytes, which is organized as 4-byte blocks, and a 2-way set-associative cache of 256 bytes with a block size of 4 bytes. Because the main memory is 16kB, we need a minimum of 14 bits to uniquely represent a memory address.▼ [[File:Set-Associative Cache Snehal Img.png\|thumb\|578x578px\|Set-Associative Cache]]▼ ▲Consider a main memory of 16 kilobytes, which is organized as 4-byte blocks, and a 2-way set-associative cache of 256 bytes with a block size of 4 bytes. Since each cache block is of size 4 bytes and is 2-way set-associative, the total number of sets in the cache is 256/(4 * 2), which equals 32 sets. ▲[[File:Set-Associative Cache Snehal Img.png\|thumb\|578x578px\|Set-Associative Cache]] The incoming address to the cache is divided into bits for Offset, Index and Tag. * ''Offset'' corresponds to the bits used to determine the byte to be accessed from the cache line. Because the cache lines are 4 bytes long, there are ''2 offset bits''. In this example, there are 2 offset bits, which are used to address the 4 bytes of a cache line; there are 5 index bits, which are used to address the 32 sets of the cache; and there are 7 = (14 – (5+2)) tag bits, which are stored in tag to match against addresses from cache requests. * ''Index'' corresponds to bits used to determine the set of the Cache. There are 32 sets in the cache, and because 2^5 = 32, there are ''5 index bits.'' * ''Tag'' corresponds to the remaining bits. This means there are 14 – (5+2) = ''7 bits'', which are stored in tag field to match the address on cache request. Address 0x0000(tag – 000_0000, index – 0_0000, offset – 00) maps to block 0 of the memory and occupies the set 0 of the cache. The block occupies one of the cache lines of the set 0 and is determined by the replacement policy for the cache.▼ ~~Address 0x0004(tag – 000_0000, index – 0_0001, offset – 00) maps to block 1 of the memory and occupies one of the cache lines of the set 1 of the cache.~~ Below are memory addresses and an explanation of which cache line on which set they map to: ~~Similarly, address 0x00FF(tag – 000_0001, index – 1_1111, offset – 11) maps to block 63 of the memory and occupies one of the cache lines of the set 31 of the cache.~~ # Address ~~0x0100~~<code>0x0000</code> (tag –- ~~000_0010~~<code>0b000_0000</code>, index – ~~0_0000~~<code>0b0_0000</code>, offset – 00<code>0b00</code>) ~~maps~~corresponds to block 640 of the memory and ~~occupies~~maps ~~one~~to the set 0 of the cache. ~~lines~~The ofblock ~~the~~occupies a cache line in set 0, ofdetermined by the replacement policy for the cache. # Address <code>0x0004</code> (tag - <code>0b000_0000</code>, index – <code>0b0_0001</code>, offset – <code>0b00</code>) corresponds to block 1 of the memory and maps to the set 1 of the cache. The block occupies a cache line in set 1, determined by the replacement policy for the cache. # Address <code>0x00FF</code> (tag – <code>0b000_0001</code>, index – <code>0b1_1111</code>, offset – <code>0b11</code>) corresponds to block 63 of the memory and maps to the set 31 of the cache. The block occupies a cache line in set 31, determined by the replacement policy for the cache. ▲# Address ~~0x0000~~<code>0x0100</code> (tag – ~~000_0000~~<code>0b000_0010</code>, index – ~~0_0000~~<code>0b0_0000</code>, offset – 00<code>0b00</code>) ~~maps~~corresponds to block 064 of the memory and ~~occupies~~maps to the set 0 of the cache. The block occupies ~~one of the~~a cache ~~lines~~line ~~of the~~in set 0 ~~and is~~, determined by the replacement policy for the cache. == Two-way skewed associative cache == Other schemes have been suggested, such as the ''skewed cache'',<ref name="Seznec">{{cite journal\|author=André Seznec\|author-link=André Seznec\|year=1993\|title=A Case for Two-Way Skewed-Associative Caches\|journal=ACM SIGARCH Computer Architecture News\|volume=21\|issue=2\|pages=169–178\|doi=10.1145/173682.165152\|doi-access=free}}</ref> where the index for way 0 is direct, as above, but the index for way 1 is formed with a [[hash function]]. A good hash function has the property that addresses which conflict with the direct mapping tend not to conflict when mapped with the hash function, and so it is less likely that a program will suffer from an unexpectedly large number of conflict misses due to a pathological access pattern. The downside is extra latency from computing the hash function.<ref name="CK">{{cite web\|url=http://www.stanford.edu/class/ee282/08_handouts/L03-Cache.pdf\|title=Lecture 3: Advanced Caching Techniques\|author=C. Kozyrakis\|author-link=Christos Kozyrakis\|archive-url=https://web.archive.org/web/20120907012034/http://www.stanford.edu/class/ee282/08_handouts/L03-Cache.pdf\|archive-date=September 7, 2012\|url-status=dead}}</ref> Additionally, when it comes time to load a new line and evict an old line, it may be difficult to determine which existing line was least recently used, because the new line conflicts with data at different indexes in each way; [[Cache algorithms\|LRU]] tracking for non-skewed caches is usually done on a per-set basis. Nevertheless, skewed-associative caches have major advantages over conventional set-associative ones.<ref> [http://www.irisa.fr/caps/PROJECTS/Architecture/ Micro-Architecture] "Skewed-associative caches have ... major advantages over conventional set-associative caches." </ref> == Pseudo-associative cache == A true set-associative cache tests all the possible ways simultaneously, using something like a [[content -addressable memory]]. A pseudo-associative cache tests each possible way one at a time. A hash-rehash cache and a column-associative cache are examples of a pseudo-associative cache. In the common case of finding a hit in the first way tested, a pseudo-associative cache is as fast as a direct-mapped cache, but it has a much lower conflict miss rate than a direct-mapped cache, closer to the miss rate of a fully associative cache.<ref name="CK" /> Line 144 ⟶ 133: * [[CPU cache#Associativity\|Associativity]] * [[Cache algorithms\|Cache replacement policy]] * [[~~CPU cache#Cache hierarchy in a modern processor\|~~Cache hierarchy]] * [[Cache (computing)#Writing policies\|Writing Policies]] * [[Cache coloring]]