Revision as of 01:44, 14 June 2016 edit Fmadd (talk \| contribs) Extended confirmed users 10,468 edits →Examples ← Previous edit		Revision as of 05:18, 14 June 2016 edit undo BG19bot (talk \| contribs) 1,005,055 edits m WP:CHECKWIKI error fix. Broken bracket. Do general fixes if a problem exists. - Next edit →
Line 5: <ref>{{cite web\|title= Exploiting Memory Access Patterns to Improve Memory Performance in Data-Parallel Architectures\|url=http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=5473222&url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D5473222}}</ref> <ref>{{cite web\|title=Analysis of Energy and Performance of Code Transformations for PGAS-based Data Access Patterns\|\|url=http://nic.uoregon.edu/pgas14/papers/pgas14_submission_17.pdf}}</ref> <ref>{{cite web\|title=xeon phi optimization\|url=https://books.google.co.uk/books?id=DDpUCwAAQBAJ&pg=PA231&lpg=PA231&dq=scatter+memory+access+pattern&source=bl&ots=YiIyhx3udO&sig=QmwMzpNsEbVKma3YuC47Nq0nvmY&hl=en&sa=X&ved=0ahUKEwjo-ZzHv6XNAhXK1xQKHdvuAGYQ6AEILTAC#v=onepage&q=scatter%20memory%20access%20pattern&f=false}}</ref> Further, [[cache coherency]] issues can affect multiprocessor performance Line 11: , which means that certain memory access patterns place a ceiling on parallelism (which [[manycore]] approaches seek to break ,<ref>{{cite web\|title=intel terascale\|url=https://cseweb.ucsd.edu/classes/fa12/cse291-c/talks/SCC-80-core-cern.pdf}}</ref>, ). Computer memory is usually described as '[[random access memory\|random access]]', but traversals by software will still exhibit patterns that can be exploited for efficiency. Various tools exist to help programmers understand,analyse and improve the memory access pattern, e.g. [[VTune]] and others <ref>{{cite web\|title=QUAD a memory access pattern analyser\|url=http://ce-publications.et.tudelft.nl/publications/207_quad__a_memory_access_pattern_analyser.pdf}}</ref> <ref>{{cite web\|title=Dymaxion: Optimizing Memory Access Patterns for Heterogeneous Systems\|url=http://www.cs.virginia.edu/~skadron/Papers/sc11_dymaxion_dist.pdf}}</ref> <ref>{{cite web\|title=evaluation of a memory access classification scheme for pointer intensive and numeric programs\|url=http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=555D92F6A16D03EE4A8057EEC1AFA80B?doi=10.1.1.48.4163&rep=rep1&type=pdf}}</ref> ,including tools to address GPU memory access patterns<ref>{{cite web\|title=CuMAPz: a tool to analyze memory access patterns in CUDA\|url=http://dl.acm.org/citation.cfm?id=2024754}}</ref> Memory access patterns also have implications for security,<ref>{{cite web\|title=Memory Access Pattern Protection for Resource-constrained Devices\|url=https://www.cardis.org/proceedings/cardis_2012/CARDIS2012_14.pdf}}</ref>, <ref>{{cite web\|title=understanding cache attacks\|url=https://www.rocq.inria.fr/secret/Anne.Canteaut/Publications/RR-5881.pdf}}</ref> which motivates some to try and disguise a programs activity for privacy reasons.<ref>{{cite web\|title=protecting data in the cloud\|url=http://news.mit.edu/2013/protecting-data-in-the-cloud-0702}}</ref>. <ref>{{cite web\|title=boosting-cloud-security-with----oblivious-ram\|url=http://www.information-age.com/technology/security/123457364/boosting-cloud-security-with----oblivious-ram---}}proposed RAM design avoiding memory-access-pattern vulnerabilities</ref> == Examples == === Sequential === The simplest extreme is the [[sequential access]] pattern , where data is read , processed, and written out with straightforward incremented/decremented addressing. These access patterns are highly amenable to [[prefetching]]. Line 31 ⟶ 32: === {{anchor\|STRIDED}} Strided === Strided or simple 2D,3D access patterns (e.g. stepping through [[multi-dimensional array]]s) are similarly easy to predict, and are found in implementations of [[linear algebra]] algorithms and [[image processing]]. [[Loop tiling]] is an effective approach. <ref>{{cite web\|title=optimizing for tiling and data locality\|url=http://www.cs.utexas.edu/users/mckinley/papers/par-mem-ics-92.pdf}}paper covers loop tiling and implication for parallel code</ref> Some systems with DMA provided a strided mode for transferring data between subtile of larger 2D arrays and scratchpad memory. <ref>{{cite web\|title=Optimal 2D Data Partitioning for DMA Transfers on MPSoCs\|url=http://www-verimag.imag.fr/~maler/Papers/dma2dim.pdf}}</ref> === {{anchor\|LINEAR}} Linear === Line 37 ⟶ 38: === Nearest neighbour === Nearest neighbour memory access patterns appear in simulation, and are related to sequential/strided patterns. An algorithm may traverse a data structure using information from the nearest neighbours of a data element (in one or more dimensions) to perform a calculation. These are common in physics simulations operating on grids.<ref name="PGAS programming"/>. Nearest neighbour can also refer to inter-node communication in a cluster; Physics simulations which rely on such local access patterns can be parallelized with the data partitioned into cluster nodes, with purely nearest-neighbour communication between them, which may have advantages for latency and communication bandwidth. This use case maps well onto [[torus network topology]]. <ref>{{cite web\|title=Quantifying Locality In The Memory Access Patterns of HPC Applications\|url=http://www.sdsc.edu/~allans/sc05_locality.pdf}}mentions nearest neighbour access patterns in clusters</ref> Line 46 ⟶ 47: === {{anchor\|SCATTER}}"Scatter" memory access pattern=== A [[Scatter (vector addressing)\|scatter]] memory access pattern combines sequential reads with indexed/random addressing for writes. Compared to [[#GATHER\|'gather]], It may place less load on a cache hierarchy since a [[processing element]] may dispatch writes in a 'fire and forget' manner (bypassing a cache altogether), whilst using predicatble prefetching (or even DMA) for it's source data. However, it may be harder to parallelise since there is no guarantee the writes do not interact.,<ref name="gpu gems"/>, and many systems are still designed assuming that a hardware cache will coalesce many small writes into larger ones. In the past, [[forward texture mapping]] attempted to handle the randomness with 'writes', whilst sequentially reading source texture information. Line 54 ⟶ 55: === {{anchor\|GATHER}} "Gather" memory access pattern === In a [[gather (vector addressing)\|gather]] memory access pattern, reads are randomly addressed or indexed, whilst the writes are sequential (or [[#LINEAR\|linear]]). An example is found in [[inverse texture mapping]], where data can be written out linearly across scanlines, whilst random access texture addresses are calculated per pixel. Compared to [[#SCATTER\|scatter]], the disadvantage is that caching (and bypassing latencies) is now essential for reads, however it is easier to parallelise since the writes are guaranteed to not overlap. As such the gather approach is more common for [[gpgpu]] programming,<ref name="gpu gems"/>, where the massive threading (enabled by parallelism) is used to hide read latencies. <ref name = "gpu gems">{{cite web\|title = GPU gems\|url=https://books.google.co.uk/books?id=lGMzmbUhpiAC&pg=PA51&lpg=PA51&dq=scatter+memory+access+pattern&source=bl&ots=nwKeqfQHeV&sig=lsFTC52P_Fu7EU1-JIlwqDdZozg&hl=en&sa=X&ved=0ahUKEwjtxZOHxqXNAhWEOxoKHWPsAJgQ6AEISTAG#v=onepage&q=scatter%20memory%20access%20pattern&f=false}}deals with 'scatter memory access patterns' and 'gather memory access patterns' in the text</ref> Line 78 ⟶ 79: ''[[Locality of reference]]'' refers to a property exhibited by ''memory access patterns''. A programmer will ''change'' the memory access pattern (by reworking algorithms) to ''improve'' the locality of reference ,<ref>{{cite web\|title=optimize-data-structures-and-memory-access-patterns-to-improve-data-locality\|url=https://software.intel.com/en-us/articles/optimize-data-structures-and-memory-access-patterns-to-improve-data-locality}}</ref>, ''and'' to increase potential for parallelism.<ref name="gpu gems2"/>. A programmer or system designer may create frameworks or abstractions (e.g [[C++ templates]] or [[higher-order functions]]) that encapsulate a specific memory access pattern. <ref>{{cite web\|title=Template-based Memory Access Engine for Accelerators in SoCs\|url=http://www.cs.utah.edu/~zfang/asp_dac11.pdf}}</ref> <ref>{{cite web\|title=Multi-Target Vectorization With MTPS C++ Generic Library\|url=http://www.metz.supelec.fr/metz/recherche/publis_pdf/Supelec753.pdf}}a C++ template library for producing optimised memory access patterns</ref> Different considerations for memory access patterns appear in parallelism beyond locality of reference, namely the separation of reads and writes. E.g: even if the reads and writes are 'perfectly' local, it can be impossible to parallelise due to dependancies; separating the reads and writes into separate areas yields a ''different'' memory access pattern, maybe initially appear worse in pure ''locality'' terms, but desirable to leverage modern parallel hardware. Line 95 ⟶ 96: {{Reflist}} [[Category:~~computing~~Computing]]

Memory access pattern: Difference between revisions