General-purpose computing on graphics processing units: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 03:13, 14 August 2025 edit Tiggerjay (talk \| contribs) Extended confirmed users, Page movers, Pending changes reviewers, Rollbackers, Temporary account IP viewers 17,571 edits m Tiggerjay moved page General-purpose computing on graphics processing units to General-purpose computing on graphics processing units (software) without leaving a redirect: Perform technical move requested at WP:RM/TR (permalink): - reverting undiscussed move - additional cleanup required Tag: pageswap GUI ← Previous edit		Latest revision as of 10:11, 22 August 2025 edit undo KylieTastic (talk \| contribs) Autopatrolled, Administrators 520,998 edits no longer true
(11 intermediate revisions by 8 users not shown)
Line 1: {{Short description\|Use of a GPU for computations typically assigned to CPUs}} {{Use dmy dates\|date=January 2015}} {{More citations needed\|date=February 2022}} Line 69 ⟶ 70: ===Vectorization=== {{See also\|Vector_processor#GPU_vector_processing_features\|SIMD\|SWAR\|Single instruction, multiple threads{{!}}SIMT}} {{Unreferenced section\|date=July 2017}} Most operations on the GPU operate in a vectorized fashion: one operation can be performed on up to four values at once.{{Disputed inline\|date=July 2025}} For example, if one color {{angbr\|R1, G1, B1}} is to be modulated by another color {{angbr\|R2, G2, B2}}, the GPU can produce the resulting color {{angbr\|R1R2, G1G2, B1*B2}} in one operation. This functionality is useful in graphics because almost every basic data type is a vector (either 2-, 3-, or 4-dimensional).{{citation needed\|date=July 2017}} Examples include vertices, colors, normal vectors, and texture coordinates. Line 92 ⟶ 93: GPUs have very large [[Register file\|register files]], which allow them to reduce context-switching latency. Register file size is also increasing over different GPU generations, e.g., the total register file size on Maxwell (GM200), Pascal and Volta GPUs are 6 MiB, 14 MiB and 20 MiB, respectively.<ref>"[https://devblogs.nvidia.com/parallelforall/inside-pascal/ Inside Pascal: Nvidia’s Newest Computing Platform] {{webarchive\|url=https://web.archive.org/web/20170507110037/https://devblogs.nvidia.com/parallelforall/inside-pascal/ \|date=7 May 2017 }}"</ref><ref>"[https://devblogs.nvidia.com/inside-volta/ Inside Volta: The World’s Most Advanced Data Center GPU] {{webarchive\|url=https://web.archive.org/web/20200101171030/https://devblogs.nvidia.com/inside-volta/ \|date=1 January 2020 }}"</ref> By comparison, the size of a [[Processor register\|register file on CPUs]] is small, typically tens or hundreds of kilobytes. In essence: almost all GPU workloads are inherently massively-parallel LOAD-COMPUTE-STORE in nature, such as [[Tiled rendering]]. Even storing one temporary vector for further recall (LOAD-COMPUTE-STORE-COMPUTE-LOAD-COMPUTE-STORE) is so expensive due to the [[Random-access_memory#Memory_wall\|Memory wall]] problem that it is to be avoided at all costs.<ref>{{cite book \| last1=Li \| first1=Jie \| last2=Michelogiannakis \| first2=George \| last3=Cook \| first3=Brandon \| last4=Cooray \| first4=Dulanya \| last5=Chen \| first5=Yong \| title=High Performance Computing \| chapter=Analyzing Resource Utilization in an HPC System: A Case Study of NERSC's Perlmutter \| series=Lecture Notes in Computer Science \| date=2023 \| volume=13948 \| pages=297–316 \| doi=10.1007/978-3-031-32041-5_16 \| isbn=978-3-031-32040-8 \| chapter-url=https://link.springer.com/chapter/10.1007/978-3-031-32041-5_16 }}</ref> The result is that register file size ''has'' to increase. In standard CPUs it is possible to introduce [[Cache (computing)\|caches]] (a [[D-cache]]) to solve this problem, however these are relativrly so large that they are impractical to introduce in GPUs which would need one per Processing Element. [[ILLIAC IV]] innovatively solved the problem around 1967 by introducing a local memory per Processing Element (a PEM): a strategy copied by the [[Flynn%27s_taxonomy#Associative_processor\|Aspex ASP]]. ===Energy efficiency===