Content deleted Content added
Citation bot (talk | contribs) Alter: title, template type, url. URLs might have been anonymized. Add: s2cid, page, volume, series, isbn, doi, pages, chapter-url, chapter, authors 1-1. Removed or converted URL. Removed parameters. Some additions/deletions were parameter name changes. | Use this bot. Report bugs. | Suggested by Dominic3203 | Category:Software optimization | #UCB_Category 36/59 |
→Random: fmt |
||
(5 intermediate revisions by 5 users not shown) | |||
Line 1:
In [[computing]], a '''memory access pattern''' or '''IO access pattern''' is the pattern with which a system or program reads and writes [[Memory (computing)|memory]] on [[secondary storage]]<!--there no limit it levels (1 and 4 level could be used too), but we don't have sources yet -->. These patterns differ in the level of [[locality of reference]] and drastically affect [[Cache (computing)|cache]] performance,<ref name=":0">{{cite web |title=Introduction to Data-Oriented Design |url=http://www.dice.se/wp-content/uploads/2014/12/Introduction_to_Data-Oriented_Design.pdf |url-status=dead |archive-url=https://web.archive.org/web/20191116014412/http://www.dice.se/wp-content/uploads/2014/12/Introduction_to_Data-Oriented_Design.pdf |archive-date=2019-11-16}}</ref> and also have implications for the approach to [[parallelism (computing)|parallelism]]<ref>{{cite journal|last1=Jang|first1=Byunghyun|last2=Schaa|first2=Dana|last3=Mistry|first3=Perhaad|last4=Kaeli|first4=David|name-list-style=amp|date=2010-05-27|title=Exploiting Memory Access Patterns to Improve Memory Performance in Data-Parallel Architectures|journal=IEEE Transactions on Parallel and Distributed Systems|publisher=[[Institute of Electrical and Electronics Engineers|IEEE]]|___location=New York|volume=22|issue=1|pages=105–118|eissn=1558-2183|doi=10.1109/TPDS.2010.107|s2cid=15997131|issn=1045-9219|id=NLM unique id 101212014}}</ref><ref>{{cite book |last1=Jeffers |first1=James |url=https://books.google.com/books?id=DDpUCwAAQBAJ&q=scatter+memory+access+pattern&pg=PA231 |title=Intel Xeon Phi Processor High Performance Programming: Knights Landing Edition |last2=Reinders |first2=James |last3=Sodani |first3=Avinash |date=2016-05-31 |publisher=Morgan Kaufmann |isbn=9780128091951 |edition=2nd}}</ref> and distribution of workload in [[shared memory system]]s.<ref>{{Cite book |last1=Jana |first1=Siddhartha |last2=Schuchart |first2=Joseph |last3=Chapman |first3=Barbara|author3-link=Barbara Chapman |chapter=Analysis of Energy and Performance of PGAS-based Data Access Patterns |date=2014-10-06 |title=Proceedings of the 8th International Conference on Partitioned Global Address Space Programming Models |chapter-url=https://nic.uoregon.edu/pgas14/papers/pgas14_submission_17.pdf |series=PGAS '14 |___location=New York, NY, USA |publisher=Association for Computing Machinery |pages=1–10 |doi=10.1145/2676870.2676882 |isbn=978-1-4503-3247-7}}</ref> Further, [[cache coherency]] issues can affect [[multiprocessor]] performance,<ref>{{Cite
[[Computer memory]] is usually described as "[[random access memory|random access]]", but traversals by software will still exhibit patterns that can be exploited for efficiency. Various tools exist to help system designers<ref>{{cite book |last1=Brown |first1=Mary |url=http://dl.acm.org/citation.cfm?id=838115 |title=Memory Access Pattern Analysis |last2=Jenevein |first2=Roy M. |last3=Ullah |first3=Nasr |date=29 November 1998 |isbn=9780769504506 |series=WWC '98: Proceedings of the Workload Characterization: Methodology and Case Studies |publication-date=1998-11-29 |page=105}}</ref> and programmers understand, analyse and improve the memory access pattern, including [[VTune]] and [[Intel Advisor|Vectorization Advisor]],<ref>{{Cite book |last1=Ostadzadeh |first1=S. Arash |last2=Meeuws |first2=Roel J. |last3=Galuzzi |first3=Carlo |last4=Bertels |first4=Koen |chapter=QUAD – A Memory Access Pattern Analyser |series=Lecture Notes in Computer Science |date=2010 |volume=5992 |editor-last=Sirisuk |editor-first=Phaophak |editor2-last=Morgan |editor2-first=Fearghal |editor3-last=El-Ghazawi |editor3-first=Tarek |editor4-last=Amano |editor4-first=Hideharu |title=Reconfigurable Computing: Architectures, Tools and Applications |chapter-url=http://ce-publications.et.tudelft.nl/publications/207_quad__a_memory_access_pattern_analyser.pdf |language=en |___location=Berlin, Heidelberg |publisher=Springer |pages=269–281 |doi=10.1007/978-3-642-12133-3_25 |isbn=978-3-642-12133-3}}</ref><ref>{{Cite book |last1=Che |first1=Shuai |last2=Sheaffer |first2=Jeremy W. |last3=Skadron |first3=Kevin |chapter=Dymaxion: Optimizing memory access patterns for heterogeneous systems |date=2011-11-12 |title=Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis |chapter-url=https://www.cs.virginia.edu/~skadron/Papers/sc11_dymaxion_dist.pdf |series=SC '11 |___location=New York, NY, USA |publisher=Association for Computing Machinery |pages=1–11 |doi=10.1145/2063384.2063401 |isbn=978-1-4503-0771-0}}</ref><ref>{{Cite book |last=Harrison |first=Luddy |chapter=Examination of a memory access classification scheme for pointer-intensive and numeric programs |date=1996-01-01 |title=Proceedings of the 10th international conference on Supercomputing - ICS '96 |chapter-url=https://dl.acm.org/doi/10.1145/237578.237595 |___location=New York, NY, USA |publisher=Association for Computing Machinery |pages=133–140 |doi=10.1145/237578.237595 |isbn=978-0-89791-803-9}}</ref><ref>{{cite book|chapter=Online Memory Access Pattern Analysis on an Application Profiling Tool|doi=10.1109/CANDAR.2014.86|isbn=978-1-4799-4152-0|title=2014 Second International Symposium on Computing and Networking|year=2014|last1=Matsubara|first1=Yuki|last2=Sato|first2=Yukinori|pages=602–604|s2cid=16476418}}</ref><ref>{{cite web|title=Putting Your Data and Code in Order: Data and layout|url=https://software.intel.com/en-us/articles/putting-your-data-and-code-in-order-data-and-layout-part-2}}</ref> including tools to address [[GPU]] memory access patterns.<ref>{{Cite book |last1=Kim |first1=Yooseong |last2=Shrivastava |first2=Aviral |chapter=CuMAPz: A tool to analyze memory access patterns in CUDA |date=2011-06-05 |title=Proceedings of the 48th Design Automation Conference |chapter-url=https://dl.acm.org/doi/10.1145/2024724.2024754 |series=DAC '11 |___location=New York, NY, USA |publisher=Association for Computing Machinery |pages=128–133 |doi=10.1145/2024724.2024754 |isbn=978-1-4503-0636-2}}</ref>
Line 13:
=== Strided ===
[[Stride (computing)|Strided]] or simple 2D, 3D access patterns (e.g., stepping through [[multi-dimensional array]]s) are similarly easy to predict, and are found in implementations of [[linear algebra]] algorithms and [[image processing]]. [[Loop tiling]] is an effective approach.<ref>{{Cite book |last1=Kennedy |first1=Ken |last2=McKinley |first2=Kathryn S. |chapter=Optimizing for parallelism and data locality |date=1992-08-01 |title=Proceedings of the 6th international conference on Supercomputing - ICS '92 |chapter-url=https://www.cs.utexas.edu/~mckinley/papers/par-mem-ics-92.pdf |___location=New York, NY, USA |publisher=Association for Computing Machinery |pages=323–334 |doi=10.1145/143369.143427 |isbn=978-0-89791-485-7}}</ref> Some systems with [[Direct memory access|DMA]] provided a strided mode for transferring data between subtile of larger [[2D array]]s and [[scratchpad memory]].<ref>{{Cite
=== Linear ===
Line 37:
In a [[gather (vector addressing)|gather]] memory access pattern, reads are randomly addressed or indexed, whilst the writes are sequential (or linear).<ref name="gpu gems2"/> An example is found in [[inverse texture mapping]], where data can be written out linearly across [[scan line]]s, whilst random access texture addresses are calculated per [[pixel]].
Compared to scatter, the disadvantage is that caching (and bypassing latencies) is now essential for efficient reads of small elements, however it is easier to parallelise since the writes are guaranteed to not overlap. As such the gather approach is more common for [[
=== Combined gather and scatter ===
Line 45:
=== Random ===
{{main|Random access}}
At the opposite extreme is a truly random memory access pattern. A few multiprocessor systems are specialised to deal with these.<ref>{{Cite conference |last=Wichmann |first=Nathan |date=2005 |title=Cray and HPCC : Benchmark Developments and Results from the Past Year |url=https://cug.org/5-publications/proceedings_attendee_lists/2005CD/S05_Proceedings/pages/Authors/Wichmann/Wichmann_paper.pdf |conference=CUG 2005 Proceedings}} see global random access results for Cray X1. vector architecture for hiding latencies, not so sensitive to cache coherency</ref> The [[Partitioned global address space|PGAS]] approach may help by sorting operations by data on the fly (useful when the problem
== Approaches ==
=== Data-oriented design ===
[[Data-oriented design]] is an approach intended to maximise the locality of reference, by organising data according to how it is traversed in various stages of a program, contrasting with the more common [[Object-oriented programming|object oriented]] approach (i.e., organising such that data layout explicitly mirrors the access pattern).<ref name=":0" />
== Contrast with locality of reference ==
|