Revision as of 11:22, 12 August 2025 edit Lkcl (talk \| contribs) Extended confirmed users 3,004 edits →Description: add memory wall link Tags: Mobile edit Mobile web edit Advanced mobile edit ← Previous edit		Revision as of 11:24, 12 August 2025 edit undo Lkcl (talk \| contribs) Extended confirmed users 3,004 edits →Description: clarify disadvantage Tags: Mobile edit Mobile web edit Advanced mobile edit Next edit →
Line 38: SIMT is intended to limit [[instruction fetching]] overhead,<ref>{{cite conference \|first1=Sean \|last1=Rul \|first2=Hans \|last2=Vandierendonck \|first3=Joris \|last3=D’Haene \|first4=Koen \|last4=De Bosschere \|title=An experimental study on performance portability of OpenCL kernels \|year=2010 \|conference=Symp. Application Accelerators in High Performance Computing (SAAHPC)\|hdl=1854/LU-1016024 \|hdl-access=free }}</ref> i.e. the latency that comes with memory access, and is used in modern GPUs (such as those of [[Nvidia\|NVIDIA]] and [[AMD]]) in combination with 'latency hiding' to enable high-performance execution despite considerable latency in memory-access operations. As with SIMD, another major benefit is the sharing of the control logic by many data lanes, leading to an increase in computational density. One block of control logic can manage N data lanes, instead of replicating the control logic N times. A downside of SIMT execution is the fact that, as there is only one Program Counter, [[Predication_(computer_architecture)#SIMD,_SIMT_and_vector_predication\|"predicate masking"]] is the only strategy to control per-Processing Element execution, leading to poor utilization in complex algorithms. == Terminology ==

Single instruction, multiple threads: Difference between revisions