Content deleted Content added
→Description: the description of SIMT is a mess. SIMD is compared with SIMT but misses out that '''both''' can have masks. Tags: Mobile edit Mobile web edit Advanced mobile edit |
→Description: the book is not accessible publicly Tags: Mobile edit Mobile web edit Advanced mobile edit |
||
Line 35:
SIMT is intended to limit [[instruction fetching]] overhead,<ref>{{cite conference |first1=Sean |last1=Rul |first2=Hans |last2=Vandierendonck |first3=Joris |last3=D’Haene |first4=Koen |last4=De Bosschere |title=An experimental study on performance portability of OpenCL kernels |year=2010 |conference=Symp. Application Accelerators in High Performance Computing (SAAHPC)|hdl=1854/LU-1016024 |hdl-access=free }}</ref> i.e. the latency that comes with memory access, and is used in modern GPUs (such as those of [[Nvidia|NVIDIA]] and [[AMD]]) in combination with 'latency hiding' to enable high-performance execution despite considerable latency in memory-access operations. This{{Which|date=February 2025}} is where the processor is oversubscribed with computation tasks, and is able to quickly switch between tasks when it would otherwise have to wait on memory. This strategy is comparable to [[Hyperthreading|hyperthreading in CPUs]].<ref>{{cite web |url=http://www.cc.gatech.edu/~vetter/keeneland/tutorial-2011-04-14/12-advanced_topics_in_cuda.pdf |title=Advanced Topics in CUDA |date=2011 |website=cc.gatech.edu |access-date=2014-08-28}}</ref> As with SIMD, another major benefit is the sharing of the control logic by many data lanes, leading to an increase in computational density. One block of control logic can manage N data lanes, instead of replicating the control logic N times.
A downside of SIMT execution is the fact that [[Predication_(computer_architecture)#SIMD,_SIMT_and_vector_predication|"predicate masking"]] is the only
{| class="wikitable" style="text-align: center"
|