Memory access pattern

In computing, a memory access pattern is the pattern with which a system or program reads and writes memory. These patterns differ in the level of locality of reference and drastically affect cache performance,^[1] and also have implications for the approach to parallelism and distribution of workload in shared memory systems. ^[2] ^[3] ^[4] Computer memory is usually described as 'random access', but traversals by software will still exhibit patterns that can be exploited for efficiency. Memory access patterns also have implications for security^[5], ^[6] which motivates some to try and disguise a programs activity for privacy reasons^[7].

Examples

Sequential

The simplest extreme is the sequential access pattern , where data is read , processed, and written out with straightforward incremented/decremented addressing. These access patterns are highly amenable to prefetching.

Strided

Strided or simple 2D,3D access patterns (e.g. stepping through multi-dimensional arrays) are similarly easy to predict, and are found in implementations of linear algebra algorithms and image processing. Loop tiling is an effective approach. ^[8]

Spatially coherent

In 3D rendering, access patterns for texture mapping and rasterization of small primitives (with arbitrary distortions of complex surfaces) are far from linear, but can still exhibit spatial locality (e.g. in screen space or texture space) . This can be turned into good memory locality via some combination of morton order and tiling for texture maps and frame buffer data (mapping spatial regions onto cache lines), or by sorting primitives via tile based deferred rendering.^[9]

Random

At the opposite extreme is a truly random memory access pattern. A few multiprocessor systems are specialised to deal with these. ^[10] The PGAS approach may help by sorting operations by data on the fly (useful when the problem *is* figuring out the locality of unsorted data).^[11] Data structures which rely heavily on pointer chasing can often produce poor locality of reference, although sorting can sometimes help.

Approaches

Data oriented design

Data oriented design is an approach intended to maximise the locality of reference, by organising data according to how it is traversed in various stages of a program, contrasting with the more common object oriented approach.

random reads vs random writes

An task may have some inherent randomness, with alternative solutions for handling this either through reads or writes, with scatter or gather alternatives. These choices appear in GPGPU and graphics applications:-

sequential reads with scatter writes

The scatter approach may place less load on a cache hierarchy since a processing element may dispatch writes in a 'fire and forget' manner (bypassing a cache altogether), whilst using predicatble prefetching (or even DMA) for it's source data. However, it may be harder to parallelise since there is no guarantee the writes do not interact.

In the past, forward texture mapping attempted to handle the randomness with 'writes', whilst sequentially reading source texture information.

The playstation 2 console used conventional inverse texture mapping, but handled any scatter/gather processing 'on-chip' using EDRAM, whilst 3D model (and a lot of texture data) from main memory was fed sequenatially by DMA. This is why it lacked of support for indexed primitives, and sometimes needed to manage textures 'up front' in the display list.

gather reads with sequential writes

An algorithm may be re-worked such that the reads are random gather , whilst the writes are sequential. An example is found in inverse texture mapping, where data can be written out linearly across scanlines, whilst random access texture addresses are calculated per pixel. The disadvantage is that caching (and bypassing latencies) is now essential for reads, however it is easier to parallelise since the writes are guaranteed to not overlap. As such the gather approach is more common for gpgpu programming^[3], where the massive threading is used to hide read latencies.

References

^ "data oriented design" (PDF).
^ "enhancing cache coherent architectures with memory access patterns for embedded many-core systems" (PDF).
^ ^a ^b "gpgpu scatter vs gather".
^ "Analysis of Energy and Performance of Code Transformations for PGAS-based Data Access Patterns" (PDF). {{cite web}}: Cite has empty unknown parameter: |1= (help)
^ "Memory Access Pattern Protection for Resource-constrained Devices" (PDF).
^ "understanding cache attacks" (PDF).
^ "protecting data in the cloud".
^ "optimizing for tiling and data locality" (PDF).paper covers loop tiling and implication for parallel code
^ "morton order to accelerate texturing" (PDF).
^ "Cray and HPCC: Benchmark Developments and Results from the Past Year" (PDF). {{cite web}}: Cite has empty unknown parameter: |1= (help); line feed character in |title= at position 15 (help)see global random access results for Cray X1. vector architecture for hiding latencies, not so sensitive to cache coherency
^ "partitioned global address space programming".covers cases where PGAS is a win, where data may not be already sorted, e.g. dealing with complex graphs - see 'science across the irregularity spectrum'

This article is a stub. You can help Wikipedia by expanding it.

[1] "data oriented design" (PDF).

[2] "enhancing cache coherent architectures with memory access patterns for embedded many-core systems" (PDF).

[gpgpu_gems-3] "gpgpu scatter vs gather".

[4] "Analysis of Energy and Performance of Code Transformations for PGAS-based Data Access Patterns" (PDF). {{cite web}}: Cite has empty unknown parameter: |1= (help)

[5] "Memory Access Pattern Protection for Resource-constrained Devices" (PDF).

[6] "understanding cache attacks" (PDF).

[7] "protecting data in the cloud".

[8] "optimizing for tiling and data locality" (PDF).paper covers loop tiling and implication for parallel code

[9] "morton order to accelerate texturing" (PDF).

[10] "Cray and HPCC: Benchmark Developments and Results from the Past Year" (PDF). {{cite web}}: Cite has empty unknown parameter: |1= (help); line feed character in |title= at position 15 (help)see global random access results for Cray X1. vector architecture for hiding latencies, not so sensitive to cache coherency

[11] "partitioned global address space programming".covers cases where PGAS is a win, where data may not be already sorted, e.g. dealing with complex graphs - see 'science across the irregularity spectrum'

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]