Single instruction, multiple threads

Single instruction, multiple thread (SIMT) is an parallel execution model, used in some GPGPU platforms, where dynamic multithreading is simulated by SIMD processors. The processors, say a number $p$ of them, seem to be executed many more than $p$ tasks. The threads (or tasks) are in fact partitioned into blocks that map onto the processors, and these blocks execute tasks in lock-step.^[1]

SIMT introduced by Nvidia:^[2]^[3]

[The G80 Nvidia GPU architecture] introduced the single-instruction multiple-thread (SIMT) execution model where multiple independent threads execute concurrently using a single instruction.

SIMT is intended to limit instruction fetching overhead,^[4] and is used in modern GPUs (including, but not limited to those of Nvidia and AMD) in combination with 'latency hiding' to enable high-performance execution despite considerable latency in memory-access operations.^[5]

A downsides of SIMT execution is the fact that control flow has to be simulated using masking: when a processor hits an if-then-else block, and its various threads execute the different paths though the block, all threads actually pass through all of the block but for processors that hit the if part the else part is "masked out", and vice versa. A benefit of it this is inexpensive synchronization.^[1]

References

^ ^a ^b Michael McCool; James Reinders; Arch Robison (2013). Structured Parallel Programming: Patterns for Efficient Computation. Elsevier. pp. 209 ff.
^ "Nvidia Fermi Compute Arcitecture Whitepaper" (PDF). http://www.nvidia.com/. NVIDIA Corporation. 2009. Retrieved 2014-07-17. {{cite web}}: External link in |website= (help)
^ "NVIDIA Tesla: A Unified Graphics and Computing Architecture". http://www.ieee.org/. IEEE. 2008. p. 6 (Subscription required.). Retrieved 2014-08-07. {{cite web}}: External link in |website= (help)
^ Rul, Sean; Vandierendonck, Hans; D’Haene, Joris; De Bosschere, Koen (2010). An experimental study on performance portability of OpenCL kernels. Symp. Application Accelerators in High Performance Computing (SAAHPC).
^ "Advanced Topics in CUDA" (PDF). cc.gatech.edu. 2011. Retrieved 2014-08-28.

This computer science article is a stub. You can help Wikipedia by expanding it.

[spp-1] Michael McCool; James Reinders; Arch Robison (2013). Structured Parallel Programming: Patterns for Efficient Computation. Elsevier. pp. 209 ff.

[2] "Nvidia Fermi Compute Arcitecture Whitepaper" (PDF). http://www.nvidia.com/. NVIDIA Corporation. 2009. Retrieved 2014-07-17. {{cite web}}: External link in |website= (help)

[teslaPaper-3] "NVIDIA Tesla: A Unified Graphics and Computing Architecture". http://www.ieee.org/. IEEE. 2008. p. 6 (Subscription required.). Retrieved 2014-08-07. {{cite web}}: External link in |website= (help)

[4] Rul, Sean; Vandierendonck, Hans; D’Haene, Joris; De Bosschere, Koen (2010). An experimental study on performance portability of OpenCL kernels. Symp. Application Accelerators in High Performance Computing (SAAHPC).

[5] "Advanced Topics in CUDA" (PDF). cc.gatech.edu. 2011. Retrieved 2014-08-28.

[1]

[2]

[3]

[4]

[5]

Single instruction, multiple threads

See also

References