Single instruction, multiple threads: Difference between revisions

Content deleted Content added
Description: add memory wall link
Tags: Mobile edit Mobile web edit Advanced mobile edit
Description: clarify disadvantage
Tags: Mobile edit Mobile web edit Advanced mobile edit
Line 38:
SIMT is intended to limit [[instruction fetching]] overhead,<ref>{{cite conference |first1=Sean |last1=Rul |first2=Hans |last2=Vandierendonck |first3=Joris |last3=D’Haene |first4=Koen |last4=De Bosschere |title=An experimental study on performance portability of OpenCL kernels |year=2010 |conference=Symp. Application Accelerators in High Performance Computing (SAAHPC)|hdl=1854/LU-1016024 |hdl-access=free }}</ref> i.e. the latency that comes with memory access, and is used in modern GPUs (such as those of [[Nvidia|NVIDIA]] and [[AMD]]) in combination with 'latency hiding' to enable high-performance execution despite considerable latency in memory-access operations. As with SIMD, another major benefit is the sharing of the control logic by many data lanes, leading to an increase in computational density. One block of control logic can manage N data lanes, instead of replicating the control logic N times.
 
A downside of SIMT execution is the fact that, as there is only one Program Counter, [[Predication_(computer_architecture)#SIMD,_SIMT_and_vector_predication|"predicate masking"]] is the only strategy to control per-Processing Element execution, leading to poor utilization in complex algorithms.
 
== Terminology ==