Memory access pattern: Difference between revisions

Content deleted Content added
Line 27:
Benchmark Developments and Results from the Past Year|url=https://cug.org/5-publications/proceedings_attendee_lists/2005CD/S05_Proceedings/pages/Authors/Wichmann/Wichmann_paper.pdf}}see global random access results for Cray X1. vector architecture for hiding latencies, not so sensitive to cache coherency</ref>
The [[Partitioned global address space|PGAS]] approach may help by sorting operations by data on the fly (useful when the problem *is* figuring out the locality of unsorted data).<ref>{{cite web|title=partitioned global address space programming|url=https://www.youtube.com/watch?v=NU4VfjISk2M}}covers cases where PGAS is a win, where data may not be already sorted, e.g. dealing with complex graphs - see 'science across the irregularity spectrum'</ref>
 
Data structures which rely heavily on [[pointer chasing]] can often produce poor locality of reference, although sorting can sometimes help.
 
=== Combinations ===
 
==== random reads vs random writes ====
An algorithm may have inherently random and sequential components, and offer a choice of handling this through reads or writes, e.g. a sequential read combined with random writes (scatter), or random reads (gather) combined with sequential writes. These both have tradeoffs, e.g. scattered writes may bypass the need for [[caching]], as a [[processing element]] can dispatch the writes and move on without worrying about the latency. On the other hand, scattered writes may overlap, which makes parallelism harder. This consideration appears in GPGPU programming (where a GPU ). In the past, 'forward texture mapping' attempted to handle the randomness with 'writes', but the inverse approach is now more widespread.
 
== Approaches ==