Locality of reference: Difference between revisions

Content deleted Content added
Tags: Mobile edit Mobile app edit Android app edit
mNo edit summary
 
(39 intermediate revisions by 31 users not shown)
Line 1:
{{Short description|Tendency of a processor to access nearby memory locations in space or time}}
{{more citations needed|date=July 2008}}
In [[computer science]], '''locality of reference''', also known as the '''principle of locality''',<ref>Not to be confused with the [[principle of locality]] o=s*v=411##sts in physics.</ref> is the tendency of a processor to access the same set of memory locations repetitively over a short period of time.<ref>{{Cite book|title=Computer organization and architecture : designing for performance|last=William.|first=Stallings|date=2010|publisher=Prentice Hall|isbn=9780136073734|edition= 8th|___location=Upper Saddle River, NJ|oclc=268788976}}</ref> There are two basic types of reference locality {{sndNdash}} temporal and spatial locality. Temporal locality refers to the reuse of specific data, and/or resources, within a relatively small time duration. Spatial locality (also termed ''data locality'')<ref name="NistBig1">"NIST Big Data Interoperability Framework: Volume 1", [https://doi.org/10.6028/NIST.SP.1500-1r2 urn:doi:10.6028/NIST.SP.1500-1r2</ref>) refers to the use of data elements within relatively close storage locations. Sequential locality, a special case of spatial locality, occurs when data elements are arranged and accessed linearly, such as, traversing the elements in a one-dimensional [[Array data structure|array]].
 
Locality is a type of [[predictability|predictable]] behavior that occurs in computer systems. Systems thatwhich exhibit strong ''locality of reference'' are greatgood candidates for performance optimization through the use of techniques such as the [[CPU cache|caching]], [[prefetch instruction|prefetching]] for memory and advanced [[branch predictor]]s at the [[Pipeline (computing)|pipelining]] stage of a processor core.
 
== Types of locality ==
{{Infobox settlement
There are several different types of locality of reference:
|name = Shwe Pan Kone Village
|pushpin_label_position = bottom
|pushpin_map = Myanmar
|pushpin_map_caption = Location in Myanmar
|seat = [[Shwe Pan Kone]]
|seat_type = Cottage Capital
|settlement_type = [[village of Burma|Village]]
|image_skyline =
|image_map =
|map_caption =
|subdivision_type = [[List of sovereign states|Village]]
|subdivision_name = {{flag|Myanmar}}
|subdivision_type1 = [[Administrative divisions of Myanmar|Village]]
|subdivision_name1 = [[Sagaing Region]]
|subdivision_type2 = [[Districts of Myanmar|District]]
|subdivision_name2 = [[Monywa District]]
|area_total_km2 =
|population =
|population_as_of =
|population_density_km2 = auto
|coordinates = {{coord|22|17|N|95|27|E|Village:MM|display=inline,title}}
|elevation_ft =
|elevation_m =
|timezone = [[Time in Burma|MST]]
|utc_offset = +6.30
|website =
}}
'''shwe pan Kone village ''' is a township in [[Monywa District]] in the [[Sagaing Division]] of [[Myanmar]].<ref name="MIMU001">[http://www.burmalibrary.org/docs6/MIMU001_A3_SD%20&%20Township%20Overview.pdf "Myanmar States/Divisions & Townships Overview Map"] Myanmar Information Management Unit (MIMU)</ref> The principal village is [[Shwe pan Kone]].{{Infobox settlement
|settlement_type = Village
|name = Shwe Pan Kone
|native_name = ရွှေပန်းကုံး
|pushpin_label_position = right
|pushpin_map = Burma
|pushpin_map_caption = Location in Burma
|image_skyline =
|image_map =
|map_caption =
|subdivision_type = [[List of sovereign states|village]]
|subdivision_name = {{flag|Myanmar}}
|subdivision_type1 = [[Administrative divisions of Burma|Region]]
|subdivision_name1 = {{flag|Sagaing Region}}
|subdivision_type2 = [[Districts of Burma|District]]
|subdivision_name2 = [[Shwebo District]]
|subdivision_type3 = [[Village of Burma|village]]
|subdivision_name3 = [[Shwe Pan Kone Village]]
|area_total_km2 =
|population =
|population_as_of =
|population_density_km2 = auto
|coordinates = {{coord|22|22|28|N|95|47|28|E|region:MM|display=inline}}
|elevation_ft = 200
|elevation_m = 4.2
|timezone = [[Time in Burma|MST]]
|utc_offset = +6.30
|website =
}}
'''Shwe Pan Kone''' ({{lang-my|'''ရွှေပန်းကုံး'''}}) is a town in [[Shwebo District]], [[Sagaing Region]] in [[Myanmar]]. It is the administrative seat for [[Shwe Pan Kone Village]].<ref>[http://www.myanmars.net/myanmar-map/sagaing-map.htm "Map of Sagaing Division"] Myanmar's NET</ref>
 
* '''Temporal locality''': If at one point a particular memory ___location is referenced, then it is likely that the same ___location will be referenced again in the near future. There is temporal proximity between adjacent references to the same memory ___location. In this case it is common to make efforts to store a copy of the referenced data in faster memory storage, to reduce the latency of subsequent references. Temporal locality is a special case of spatial locality (see below), namely when the prospective ___location is identical to the present ___location.
==Relevance==
* '''Spatial locality''': If a particular storage ___location is referenced at a particular time, then it is likely that nearby memory locations will be referenced in the near future. In this case it is common to attempt to guess the size and shape of the area around the current reference for which it is worthwhile to prepare faster access for subsequent reference.
** '''Memory locality''' (or ''data locality''<ref name="NistBig1"/>): Spatial locality explicitly relating to [[computer memory|memory]].
* '''[[Branch (computer science)|Branch]] locality''': If there are only a few possible alternatives for the prospective part of the path in the spatial-temporal coordinate space. This is the case when an instruction loop has a simple structure, or the possible outcome of a small system of conditional branching instructions is restricted to a small set of possibilities. Branch locality is typically not spatial locality since the few possibilities can be located far away from each other.
* '''Equidistant locality''': Halfway between spatial locality and branch locality. Consider a loop accessing locations in an equidistant pattern, i.e., the path in the spatial-temporal coordinate space is a dotted line. In this case, a simple linear function can predict which ___location will be accessed in the near future.
 
In order to benefit from temporal and spatial locality, which occur frequently, most of the information storage systems are [[Computer data storage#Hierarchy of storage|hierarchical]]. Equidistant locality is usually supported by a processor's diverse nontrivial increment instructions. For branch locality, the contemporary processors have sophisticated branch predictors, and on the basis of this prediction the memory manager of the processor tries to collect and preprocess the data of plausible alternatives.
 
== Relevance ==
There are several reasons for locality. These reasons are either goals to achieve or circumstances to accept, depending on the aspect. The reasons below are not [[Disjoint sets|disjoint]]; in fact, the list below goes from the most general case to special cases:
 
Line 69 ⟶ 22:
* '''Structure of the program''': Locality occurs often because of the way in which computer programs are created, for handling decidable problems. Generally, related data is stored in nearby locations in storage. One common pattern in computing involves the processing of several items, one at a time. This means that if a lot of processing is done, the single item will be accessed more than once, thus leading to temporal locality of reference. Furthermore, moving to the next item implies that the next item will be read, hence spatial locality of reference, since memory locations are typically read in batches.
* '''Linear data structures''': Locality often occurs because code contains loops that tend to reference arrays or other data structures by indices. Sequential locality, a special case of spatial locality, occurs when relevant data elements are arranged and accessed linearly. For example, the simple traversal of elements in a one-dimensional array, from the base address to the highest element would exploit the sequential locality of the array in memory.<ref>Aho, Lam, Sethi, and Ullman. "Compilers: Principles, Techniques & Tools" 2nd ed. Pearson Education, Inc. 2007</ref> Equidistant locality occurs when the linear traversal is over a longer area of adjacent [[data structure]]s with identical structure and size, accessing mutually corresponding elements of each structure rather than each entire structure. This is the case when a [[Matrix (mathematics)|matrix]] is represented as a sequential matrix of rows and the requirement is to access a single column of the matrix.
* '''Efficiency of memory hierarchy use''': Although [[random -access memory]] presents the programmer with the ability to read or write anywhere at any time, in practice [[latency (engineering)|latency]] and throughput are affected by the efficiency of the [[Cache (computing)|cache]], which is improved by increasing the locality of reference. Poor locality of reference results in cache [[Thrashing (computer science)|thrashing]] and [[cache pollution]] and to avoid it, data elements with poor locality can be bypassed from cache.<ref>"[https://www.academia.edu/24842555/A_Survey_of_Cache_Bypassing_Techniques A Survey Of Cache Bypassing Techniques]", JLPEA, vol. 6, no. 2, 2016</ref>
 
== General usage ==
Line 79 ⟶ 32:
== Spatial and temporal locality usage ==
 
=== HierarchialHierarchical memory ===
{{main|Memory hierarchy}}
 
Line 89 ⟶ 42:
 
Typical memory hierarchy (access times and cache sizes are approximations of typical values used {{As of|2013|lc=on}} for the purpose of discussion; actual values and actual numbers of levels in the hierarchy vary):
* [[CPU register]]s (8-2568–256 registers) &ndash; immediate access, with the speed of the innermost core of the processor
* L1 [[CPU cache]]s (32&nbsp;KiBKB to 512&nbsp;[[KiBkilobyte|KB]]) &ndash; fast access, with the speed of the innermost memory bus owned exclusively by each core
* L2 CPU caches (128&nbsp;KiBKB to 24&nbsp;[[MiBmegabyte|MB]]) &ndash; slightly slower access, with the speed of the [[memory bus]] shared between twins of cores
* L3 CPU caches (2&nbsp;MiBMB up to 32a max of 64&nbsp;[[MiBmegabyte|MB]]) &ndash; even slower access, with the speed of the memory bus shared between even more cores of the same processor
* Main [[physical memory]] ([[random-access memory|RAM]]) (256&nbsp;MiBMB to 64&nbsp;[[GiBgigabyte|GB]]) &ndash; slow access, the speed of which is limited by the spatial distances and general hardware interfaces between the processor and the memory modules on the [[motherboard]]
* Disk ([[virtual memory]], [[file system]]) (1&nbsp;GiBGB to 256&nbsp;[[TiBterabyte|TB]]) &ndash; very slow, due to the narrower (in bit width), physically much longer data channel between the main board of the computer and the disk devices, and due to the extraneous software protocol needed on the top of the slow hardware interface
* Remote memory (other computers or the cloud) (practically unlimited) &ndash; speed varies from very slow to extremely slow
 
Line 118 ⟶ 71:
</syntaxhighlight>
 
The reason for this speedup is that in the first case, the reads of <code>A[i][k]</code> are in cache (since the <code>k</code> index is the contiguous, last dimension), but <code>B[k][j]</code> is not, so there is a cache miss penalty on <code>B[k][j]</code>. <code>C[i][j]</code> is irrelevant, because it can be factored[[Loop-invariant_code_motion|hoisted]] out of the inner loop{{why?|date=December 2019}}. In-- the secondloop case,variable thethere reads and writes ofis <code>C[i][j]</code> are both in cache, the reads of <code>B[k][j]</code> are in cache, and the read of <code>A[i][k]</code> can be factored out of the inner loop{{Explain|date=December 2019}}. Thus, the second example has no cache miss penalty in the inner loop while the first example has a cache penalty.
 
<syntaxhighlight lang="pascal" line="1">
for i in 0..n
for j in 0..m
temp = C[i][j]
for k in 0..p
temp = temp + A[i][k] * B[k][j];
C[i][j] = temp
</syntaxhighlight>
 
In the second case, the reads and writes of <code>C[i][j]</code> are both in cache, the reads of <code>B[k][j]</code> are in cache, and the read of <code>A[i][k]</code> can be hoisted out of the inner loop.
 
<syntaxhighlight lang="pascal" line="1">
for i in 0..n
for k in 0..p
temp = A[i][k]
for j in 0..m
C[i][j] = C[i][j] + temp * B[k][j];
</syntaxhighlight>
 
Thus, the second example has no cache miss penalty in the inner loop while the first example has a cache penalty.
 
On a year 2014 processor, the second case is approximately five times faster than the first case, when written in [[C (programming language)|C]] and compiled with <code>gcc -O3</code>. (A careful examination of the disassembled code shows that in the first case, [[GNU Compiler Collection|GCC]] uses [[SIMD]] instructions and in the second case it does not, but the cache penalty is much worse than the SIMD gain.){{Citation needed|date=September 2014}}
 
Temporal locality can also be improved in the above example by using a technique called [[Loop blocking|blocking]]. The larger matrix can be divided into evenly sized sub-matrices, so that the smaller blocks can be referenced (multiplied) several times while in memory. Note that this example works for square matrices of dimensions SIZE x SIZE, but it can easily be extended for arbitrary matrices by substituting SIZE_I, SIZE_J and SIZE_K where appropriate.
 
<syntaxhighlight lang="pascal" line="1">
Line 143 ⟶ 117:
 
* [[Cache-oblivious algorithm]]
* [[Communication-avoiding algorithm]]
* [[File system fragmentation]]
* [[Partitioned global address space]]
Line 149 ⟶ 124:
* [[Scratchpad memory]]
* [[Working set]]
* [[Heuristic]]
* [[Locality-sensitive hashing]]
 
== References ==
Line 155 ⟶ 132:
== Bibliography ==
* [[Peter J. Denning]], [http://denninginstitute.com/pjd/PUBS/CACMcols/cacmJul05.pdf "The Locality Principle"], ''Communications of the ACM'', Volume 48, Issue 7, (2005), Pages 19–24
* Peter J. Denning, Stuart C. Schwartz, [http://denninginstitute.com/pjd/PUBS/WSProp_1972.pdf "Properties of the Working-Set Model"], ''Communications of the ACM'', Volume 15, Issue 3 (March 1972), Pages 191-198191–198
 
{{DEFAULTSORT:Locality Of Reference}}