Revision as of 09:09, 10 March 2016 edit ScotXW (talk \| contribs) Extended confirmed users 14,442 edits m ref fix ← Previous edit		Revision as of 09:42, 10 March 2016 edit undo ScotXW (talk \| contribs) Extended confirmed users 14,442 edits god is this article shit… Next edit →
Line 1: '''Single instruction, multiple thread''' (SIMT) is aan execution model used in [[parallel computing~~\|parallel~~]] ~~execution~~where ~~model~~[[single instruction, ~~used~~multiple ~~in some [[GPGPU~~data]] ~~platforms,~~(SIMD) ~~where~~is combined with [[Thread (computing)#Multithreading\|~~multithreading]]~~multi ~~is simulated by [[SIMD~~threading]] ~~processors~~. The processors, say a number {{mvar\|p}} of them, seem to execute many more than {{mvar\|p}} tasks. This is achieved by each processor having multiple "threads" (or "work-items" or "Sequence of SIMD Lane operations"), which execute in lock-step, and are analogous to SIMD "lanes".<ref>{{cite book \|author1=Michael McCool \|author2=James Reinders \|author3=Arch Robison \|title=Structured Parallel Programming: Patterns for Efficient Computation \|publisher=Elsevier \|year=2013 \|page=52}}</ref> The SIMT execution model has been implemented on several GPUs and is relevant for [[general-purpose computing on graphics processing units]] (GPGPU), e.g. some [[supercomputer]]s combine CPUs with GPUs. SIMT was introduced by [[Nvidia]]:<ref>{{cite web \|url=http://www.nvidia.com/content/PDF/fermi_white_papers/NVIDIA_Fermi_Compute_Architecture_Whitepaper.pdf \|title=Nvidia Fermi Compute Arcitecture Whitepaper \|date=2009 \|website=http://www.nvidia.com/ \|publisher=NVIDIA Corporation \|accessdate=2014-07-17}}</ref><ref name=teslaPaper>{{cite web \|url=http://dx.doi.org/10.1109/MM.2008.31 \|title=NVIDIA Tesla: A Unified Graphics and Computing Architecture \|date=2008 \|website=http://www.ieee.org/ \|publisher=IEEE \|accessdate=2014-08-07 \|page=6 {{subscription required\|s}} }}</ref> {{Quote\| [~~The G80~~ Nvidia ~~GPU architecture,~~'s [[Tesla (microarchitecture)\|Tesla GPU microarchitecture]]] (first available November 8, 2006 as implemented in the ''"G80"'' GPU chip) introduced the single-instruction multiple-thread (SIMT) execution model where multiple independent threads execute concurrently using a single instruction.}} [[ATI Technologies]] (now [[Advanced Micro Devices\|AMD]]) released a competing product slightly later on May 14, 2007, the [[TeraScale (microarchitecture)#TeraScale 1\|TeraScale 1]]-based ''"R600"'' GPU chip. <!As access time of all the wide-spread [[random-access memory\|RAM]] types (e.g. [[DDR SDRAM]], [[GDDR SDRAM]], [[XDR DRAM]], etc.) is still very low, some engineers came up with the idea to hide that latency, that inevitably comes with each memory access. Strictly, the latency-hiding is a feature of the zero-overhead scheduling implemented by modern GPUs... this might or might not be considered to be a property of 'SIMT' itself ~~-->~~.▼ SIMT is intended to limit [[instruction fetching]] overhead,<ref>{{cite conference \|first1=Sean \|last1=Rul \|first2=Hans \|last2=Vandierendonck \|first3=Joris \|last3=D’Haene \|first4=Koen \|last4=De Bosschere \|title=An experimental study on performance portability of OpenCL kernels \|year=2010 \|conference=Symp. Application Accelerators in High Performance Computing (SAAHPC)}}</ref> and is used in modern GPUs (including, but not limited to those of [[Nvidia]] and [[AMD]]) in combination with 'latency hiding' to enable high-performance execution despite considerable latency in memory-access operations. This is where the processor is oversubscribed with computation tasks, and is able to quickly switch between tasks when it would otherwise have to wait on memory. This strategy is comparable to [[Multithreading (computer architecture)\|multithreading in CPUs]] (not to be confused with [[Multi-core processor\|multi-core]]).<ref>{{cite web \|url=http://www.cc.gatech.edu/~vetter/keeneland/tutorial-2011-04-14/12-advanced_topics_in_cuda.pdf \|title=Advanced Topics in CUDA \|date=2011 \|website=cc.gatech.edu \|accessdate=2014-08-28}}</ref>▼ ▲SIMT is intended to limit [[instruction fetching]] overhead,<ref>{{cite conference \|first1=Sean \|last1=Rul \|first2=Hans \|last2=Vandierendonck \|first3=Joris \|last3=D’Haene \|first4=Koen \|last4=De Bosschere \|title=An experimental study on performance portability of OpenCL kernels \|year=2010 \|conference=Symp. Application Accelerators in High Performance Computing (SAAHPC)}}</ref>, i.e. the latency that comes with memory access, and is used in modern GPUs (including, but not limited to those of [[Nvidia]] and [[AMD]]) in combination with 'latency hiding' to enable high-performance execution despite considerable latency in memory-access operations. This is where the processor is oversubscribed with computation tasks, and is able to quickly switch between tasks when it would otherwise have to wait on memory. This strategy is comparable to [[Multithreading (computer architecture)\|multithreading in CPUs]] (not to be confused with [[Multi-core processor\|multi-core]]).<ref>{{cite web \|url=http://www.cc.gatech.edu/~vetter/keeneland/tutorial-2011-04-14/12-advanced_topics_in_cuda.pdf \|title=Advanced Topics in CUDA \|date=2011 \|website=cc.gatech.edu \|accessdate=2014-08-28}}</ref> ▲<!-- Strictly, the latency-hiding is a feature of the zero-overhead scheduling implemented by modern GPUs... this might or might not be considered to be a property of 'SIMT' itself --> A downside of SIMT execution is the fact that thread-specific control-flow is performed using "masking", leading to poor ~~utilisation~~utilization where a processor's threads follow different control-flow paths. For instance, to handle an ''ifIF''-''~~else~~ELSE'' block where various threads of a processor execute different paths, all threads must actually process both paths (as all threads of a processor always execute in lock-step), but masking is used to disable and enable the various threads as appropriate. Masking is avoided when control flow is coherent for the threads of a processor, i.e. they all follow the same path of execution. The masking strategy is what distinguishes SIMT from ordinary SIMD, and has the benefit of inexpensive synchronization between the threads of a processor.<ref>{{cite book \|author1=Michael McCool \|author2=James Reinders \|author3=Arch Robison \|title=Structured Parallel Programming: Patterns for Efficient Computation \|publisher=Elsevier \|year=2013 \|pages=209 ff.}}</ref> {\| class="wikitable" style="style="font-size:80%; text-align: center"

Single instruction, multiple threads: Difference between revisions