Revision as of 18:32, 14 August 2025 edit RMCD bot (talk \| contribs) Bots, Template editors 1,077,758 edits Notifying subject page of move discussion on Talk:General-purpose computing on graphics processing units (software) ← Previous edit		Revision as of 23:15, 15 August 2025 edit undo Citation bot (talk \| contribs) Bots 5,863,312 edits Altered template type. Add: chapter-url, isbn, doi, pages, volume, date, series, chapter, title, authors 1-5. Removed or converted URL. Changed bare reference to CS1/2. Upgrade ISBN10 to 13. \| Use this bot. Report bugs. \| Suggested by Jay8g \| #UCB_toolbar Next edit →
Line 96: GPUs have very large [[Register file\|register files]], which allow them to reduce context-switching latency. Register file size is also increasing over different GPU generations, e.g., the total register file size on Maxwell (GM200), Pascal and Volta GPUs are 6 MiB, 14 MiB and 20 MiB, respectively.<ref>"[https://devblogs.nvidia.com/parallelforall/inside-pascal/ Inside Pascal: Nvidia’s Newest Computing Platform] {{webarchive\|url=https://web.archive.org/web/20170507110037/https://devblogs.nvidia.com/parallelforall/inside-pascal/ \|date=7 May 2017 }}"</ref><ref>"[https://devblogs.nvidia.com/inside-volta/ Inside Volta: The World’s Most Advanced Data Center GPU] {{webarchive\|url=https://web.archive.org/web/20200101171030/https://devblogs.nvidia.com/inside-volta/ \|date=1 January 2020 }}"</ref> By comparison, the size of a [[Processor register\|register file on CPUs]] is small, typically tens or hundreds of kilobytes. In essence: almost all GPU workloads are inherently massively-parallel LOAD-COMPUTE-STORE in nature, such as [[Tiled rendering]]. Even storing one temporary vector for further recall (LOAD-COMPUTE-STORE-COMPUTE-LOAD-COMPUTE-STORE) is so expensive due to the [[Random-access_memory#Memory_wall\|Memory wall]] problem that it is to be avoided at all costs.<ref>{{cite book \| last1=Li \| first1=Jie \| last2=Michelogiannakis \| first2=George \| last3=Cook \| first3=Brandon \| last4=Cooray \| first4=Dulanya \| last5=Chen \| first5=Yong \| title=High Performance Computing \| chapter=Analyzing Resource Utilization in an HPC System: A Case Study of NERSC's Perlmutter \| series=Lecture Notes in Computer Science \| date=2023 \| volume=13948 \| pages=297–316 \| doi=10.1007/978-3-031-32041-5_16 \| isbn=978-3-031-32040-8 \| chapter-url=https://link.springer.com/chapter/10.1007/978-3-031-32041-5_16 }}</ref> The result is that register file size ''has'' to increase. In standard CPUs it is possible to introduce [[Cache (computing)\|caches]] (a [[D-cache]]) to solve this problem, however these are relativrly so large that they are impractical to introduce in GPUs which would need one per Processing Element. [[ILLIAC IV]] innovatively solved the problem around 1967 by introducing a local memory per Processing Element (a PEM): a strategy copied by the [[Flynn%27s_taxonomy#Associative_processor\|Aspex ASP]]. ===Energy efficiency===

General-purpose computing on graphics processing units: Difference between revisions