Revision as of 23:53, 14 July 2020 edit Artoria2e5 (talk \| contribs) Extended confirmed users, IP block exemptions 38,981 edits No edit summary ← Previous edit		Revision as of 23:55, 14 July 2020 edit undo Artoria2e5 (talk \| contribs) Extended confirmed users, IP block exemptions 38,981 edits No edit summary Next edit →
Line 1: '''Single instruction, multiple thread''' ('''SIMT''') is an execution model used in [[parallel computing]] where [[single instruction, multiple data]] (SIMD) is combined with [[Thread (computing)#Multithreading\|multithreading]]. It is different from [[SPMD]] in that all instructions in all "threads" are ran in lock-step. The SIMT execution model has been implemented on several [[GPU]]s and is relevant for [[general-purpose computing on graphics processing units]] (GPGPU), e.g. some [[supercomputer]]s combine CPUs with GPUs.▼ ~~{{missing information\|\|difference from [[SPMD]], if any}}~~ ▲'''Single instruction, multiple thread''' ('''SIMT''') is an execution model used in [[parallel computing]] where [[single instruction, multiple data]] (SIMD) is combined with [[Thread (computing)#Multithreading\|multithreading]]. ~~==Overview==~~ The processors, say a number {{mvar\|p}} of them, seem to execute many more than {{mvar\|p}} tasks. This is achieved by each processor having multiple "threads" (or "work-items" or "Sequence of SIMD Lane operations"), which execute in lock-step, and are analogous to [[SIMD lanes]].<ref>{{cite book \|author1=Michael McCool \|author2=James Reinders \|author3=Arch Robison \|title=Structured Parallel Programming: Patterns for Efficient Computation \|publisher=Elsevier \|year=2013 \|page=52}}</ref> ==History== The SIMT execution model has been implemented on several [[GPU]]s and is relevant for [[general-purpose computing on graphics processing units]] (GPGPU), e.g. some [[supercomputer]]s combine CPUs with GPUs. SIMT was introduced by [[Nvidia]]:<ref>{{cite web \|url=http://www.nvidia.com/content/PDF/fermi_white_papers/NVIDIA_Fermi_Compute_Architecture_Whitepaper.pdf \|title=Nvidia Fermi Compute Architecture Whitepaper \|date=2009 \|website=http://www.nvidia.com/ \|publisher=NVIDIA Corporation \|accessdate=2014-07-17}}</ref><ref name=teslaPaper>{{cite journal \|title=NVIDIA Tesla: A Unified Graphics and Computing Architecture \|date=2008 \|page=6 {{subscription required\|s}} \|doi=10.1109/MM.2008.31 \|volume=28 \|issue=2 \|journal=IEEE Micro\|last1=Lindholm \|first1=Erik \|last2=Nickolls \|first2=John \|last3=Oberman \|first3=Stuart \|last4=Montrym \|first4=John }}</ref> Line 12 ⟶ 11: [[ATI Technologies]] (now [[Advanced Micro Devices\|AMD]]) released a competing product slightly later on May 14, 2007, the [[TeraScale (microarchitecture)#TeraScale 1\|TeraScale 1]]-based ''"R600"'' GPU chip. == Description == As access time of all the widespread [[random-access memory\|RAM]] types (e.g. [[DDR SDRAM]], [[GDDR SDRAM]], [[XDR DRAM]], etc.) is still relatively high, engineers came up with the idea to hide the latency that inevitably comes with each memory access. Strictly, the latency-hiding is a feature of the zero-overhead scheduling implemented by modern GPUs. This might or might not be considered to be a property of 'SIMT' itself. Line 20 ⟶ 21: {\| class="wikitable" style="style="font-size:80%; text-align: center" \|+ SIMT Terminology ! Nvidia [[CUDA]] \|\| [[OpenCL]] \|\| Hennessy & Patterson<ref>{{cite book \|author1=John L. Hennessy \|author2=David A. Patterson\|title=Computer Architecture: A Quantitative Approach \|year=1990 \|url=https://archive.org/details/computerarchitec00patt_045 \|url-access=limited \|publisher=Morgan Kaufmann \|edition=6 \|pages=[https://archive.org/details/computerarchitec00patt_045/page/n263 314] ff}}</ref> \|-

Single instruction, multiple threads: Difference between revisions