General-purpose computing on graphics processing units: Difference between revisions

Content deleted Content added
m Tiggerjay moved page General-purpose computing on graphics processing units to General-purpose computing on graphics processing units (software) without leaving a redirect: Perform technical move requested at WP:RM/TR (permalink): - reverting undiscussed move - additional cleanup required
no longer true
 
(11 intermediate revisions by 8 users not shown)
Line 1:
{{Short description|Use of a GPU for computations typically assigned to CPUs}}
 
{{Use dmy dates|date=January 2015}}
{{More citations needed|date=February 2022}}
Line 69 ⟶ 70:
 
===Vectorization===
{{See also|Vector_processor#GPU_vector_processing_features|SIMD|SWAR|Single instruction, multiple threads{{!}}SIMT}}
{{Unreferenced section|date=July 2017}}
Most operations on the GPU operate in a vectorized fashion: one operation can be performed on up to four values at once.{{Disputed inline|date=July 2025}} For example, if one color {{angbr|R1, G1, B1}} is to be modulated by another color {{angbr|R2, G2, B2}}, the GPU can produce the resulting color {{angbr|R1*R2, G1*G2, B1*B2}} in one operation. This functionality is useful in graphics because almost every basic data type is a vector (either 2-, 3-, or 4-dimensional).{{citation needed|date=July 2017}} Examples include vertices, colors, normal vectors, and texture coordinates.
Line 92 ⟶ 93:
GPUs have very large [[Register file|register files]], which allow them to reduce context-switching latency. Register file size is also increasing over different GPU generations, e.g., the total register file size on Maxwell (GM200), Pascal and Volta GPUs are 6&nbsp;MiB, 14&nbsp;MiB and 20&nbsp;MiB, respectively.<ref>"[https://devblogs.nvidia.com/parallelforall/inside-pascal/ Inside Pascal: Nvidia’s Newest Computing Platform] {{webarchive|url=https://web.archive.org/web/20170507110037/https://devblogs.nvidia.com/parallelforall/inside-pascal/ |date=7 May 2017 }}"</ref><ref>"[https://devblogs.nvidia.com/inside-volta/ Inside Volta: The World’s Most Advanced Data Center GPU] {{webarchive|url=https://web.archive.org/web/20200101171030/https://devblogs.nvidia.com/inside-volta/ |date=1 January 2020 }}"</ref> By comparison, the size of a [[Processor register|register file on CPUs]] is small, typically tens or hundreds of kilobytes.
 
In essence: almost all GPU workloads are inherently massively-parallel LOAD-COMPUTE-STORE in nature, such as [[Tiled rendering]]. Even storing one temporary vector for further recall (LOAD-COMPUTE-STORE-COMPUTE-LOAD-COMPUTE-STORE) is so expensive due to the [[Random-access_memory#Memory_wall|Memory wall]] problem that it is to be avoided at all costs.<ref>{{cite book | last1=Li | first1=Jie | last2=Michelogiannakis | first2=George | last3=Cook | first3=Brandon | last4=Cooray | first4=Dulanya | last5=Chen | first5=Yong | title=High Performance Computing | chapter=Analyzing Resource Utilization in an HPC System: A Case Study of NERSC's Perlmutter | series=Lecture Notes in Computer Science | date=2023 | volume=13948 | pages=297–316 | doi=10.1007/978-3-031-32041-5_16 | isbn=978-3-031-32040-8 | chapter-url=https://link.springer.com/chapter/10.1007/978-3-031-32041-5_16 }}</ref> The result is that register file size ''has'' to increase. In standard CPUs it is possible to introduce [[Cache (computing)|caches]] (a [[D-cache]]) to solve this problem, however these are relativrly so large that they are impractical to introduce in GPUs which would need one per Processing Element. [[ILLIAC IV]] innovatively solved the problem around 1967 by introducing a local memory per Processing Element (a PEM): a strategy copied by the [[Flynn%27s_taxonomy#Associative_processor|Aspex ASP]].
 
===Energy efficiency===