Revision as of 23:07, 15 March 2015 edit Wootery (talk \| contribs) Extended confirmed users 868 edits Added brief description of latency-hiding ← Previous edit		Revision as of 23:10, 15 March 2015 edit undo Wootery (talk \| contribs) Extended confirmed users 868 edits Attempt to clarify the unfortunate two distinct meanings of "multithreading" Next edit →
Line 1: '''Single instruction, multiple thread''' (SIMT) is a [[parallel computing\|parallel]] execution model, used in some [[GPGPU]] platforms, where [[Thread (computing)#Multithreading\|multithreading]] is simulated by [[SIMD]] processors. The processors, say a number {{mvar\|p}} of them, seem to execute many more than {{mvar\|p}} tasks. This is achieved by each processor having multiple "threads" (or "work-items"), which execute in lock-step, and are analogous to SIMD "lanes".<ref name="spp">{{cite book \|author1=Michael McCool \|author2=James Reinders \|author3=Arch Robison \|title=Structured Parallel Programming: Patterns for Efficient Computation \|publisher=Elsevier \|year=2013 \|page=52}}</ref> SIMT was introduced by [[Nvidia]]:<ref>{{cite web \|url=http://www.nvidia.com/content/PDF/fermi_white_papers/NVIDIA_Fermi_Compute_Architecture_Whitepaper.pdf \|title=Nvidia Fermi Compute Arcitecture Whitepaper \|date=2009 \|website=http://www.nvidia.com/ \|publisher=NVIDIA Corporation \|accessdate=2014-07-17}}</ref><ref name=teslaPaper>{{cite web \|url=http://dx.doi.org/10.1109/MM.2008.31 \|title=NVIDIA Tesla: A Unified Graphics and Computing Architecture \|date=2008 \|website=http://www.ieee.org/ \|publisher=IEEE \|accessdate=2014-08-07 \|page=6 {{subscription required\|s}} }}</ref> Line 5: {{Quote\| [The G80 Nvidia GPU architecture] introduced the single-instruction multiple-thread (SIMT) execution model where multiple independent threads execute concurrently using a single instruction.}} SIMT is intended to limit [[instruction fetching]] overhead,<ref>{{cite conference \|first1=Sean \|last1=Rul \|first2=Hans \|last2=Vandierendonck \|first3=Joris \|last3=D’Haene \|first4=Koen \|last4=De Bosschere \|title=An experimental study on performance portability of OpenCL kernels \|year=2010 \|conference=Symp. Application Accelerators in High Performance Computing (SAAHPC)}}</ref> and is used in modern GPUs (including, but not limited to those of [[Nvidia]] and [[AMD]]) in combination with 'latency hiding' to enable high-performance execution despite considerable latency in memory-access operations. This is where the processor is oversubscribed with computation tasks, and is able to quickly switch between tasks when it would otherwise have to wait on memory. This strategy is comparable to [[Multithreading (computer architecture)\|multithreading]] in CPUs]] (not to be confused with [[Multi-core processor\|multi-core]]).<ref>{{cite web \|url=http://www.cc.gatech.edu/~vetter/keeneland/tutorial-2011-04-14/12-advanced_topics_in_cuda.pdf \|title=Advanced Topics in CUDA \|date=2011 \|website=cc.gatech.edu \|accessdate=2014-08-28}}</ref> <!-- Strictly, the latency-hiding is a feature of the zero-overhead scheduling implemented by modern GPUs... this might or might not be considered to be a property of 'SIMT' itself -->

Single instruction, multiple threads: Difference between revisions