Content deleted Content added
Notifying subject page of move discussion on Talk:General-purpose computing on graphics processing units (software) |
Citation bot (talk | contribs) Altered template type. Add: chapter-url, isbn, doi, pages, volume, date, series, chapter, title, authors 1-5. Removed or converted URL. Changed bare reference to CS1/2. Upgrade ISBN10 to 13. | Use this bot. Report bugs. | Suggested by Jay8g | #UCB_toolbar |
||
Line 96:
GPUs have very large [[Register file|register files]], which allow them to reduce context-switching latency. Register file size is also increasing over different GPU generations, e.g., the total register file size on Maxwell (GM200), Pascal and Volta GPUs are 6 MiB, 14 MiB and 20 MiB, respectively.<ref>"[https://devblogs.nvidia.com/parallelforall/inside-pascal/ Inside Pascal: Nvidia’s Newest Computing Platform] {{webarchive|url=https://web.archive.org/web/20170507110037/https://devblogs.nvidia.com/parallelforall/inside-pascal/ |date=7 May 2017 }}"</ref><ref>"[https://devblogs.nvidia.com/inside-volta/ Inside Volta: The World’s Most Advanced Data Center GPU] {{webarchive|url=https://web.archive.org/web/20200101171030/https://devblogs.nvidia.com/inside-volta/ |date=1 January 2020 }}"</ref> By comparison, the size of a [[Processor register|register file on CPUs]] is small, typically tens or hundreds of kilobytes.
In essence: almost all GPU workloads are inherently massively-parallel LOAD-COMPUTE-STORE in nature, such as [[Tiled rendering]]. Even storing one temporary vector for further recall (LOAD-COMPUTE-STORE-COMPUTE-LOAD-COMPUTE-STORE) is so expensive due to the [[Random-access_memory#Memory_wall|Memory wall]] problem that it is to be avoided at all costs.<ref>{{cite book | last1=Li | first1=Jie | last2=Michelogiannakis | first2=George | last3=Cook | first3=Brandon | last4=Cooray | first4=Dulanya | last5=Chen | first5=Yong | title=High Performance Computing | chapter=Analyzing Resource Utilization in an HPC System: A Case Study of NERSC's Perlmutter | series=Lecture Notes in Computer Science | date=2023 | volume=13948 | pages=297–316 | doi=10.1007/978-3-031-32041-5_16 | isbn=978-3-031-32040-8 | chapter-url=https://link.springer.com/chapter/10.1007/978-3-031-32041-5_16 }}</ref> The result is that register file size ''has'' to increase. In standard CPUs it is possible to introduce [[Cache (computing)|caches]] (a [[D-cache]]) to solve this problem, however these are relativrly so large that they are impractical to introduce in GPUs which would need one per Processing Element. [[ILLIAC IV]] innovatively solved the problem around 1967 by introducing a local memory per Processing Element (a PEM): a strategy copied by the [[Flynn%27s_taxonomy#Associative_processor|Aspex ASP]].
===Energy efficiency===
|