Revision as of 04:58, 1 August 2025 edit Citation bot (talk \| contribs) Bots 5,865,207 edits Added bibcode. \| Use this bot. Report bugs. \| Suggested by Abductive \| Category:Cleanup tagged articles with a reason field from July 2025 \| #UCB_Category 80/126 ← Previous edit		Revision as of 14:06, 1 August 2025 edit undo Lkcl (talk \| contribs) Extended confirmed users 3,004 edits →Description: the description of SIMT is a mess. SIMD is compared with SIMT but misses out that '''both''' can have masks. Tags: Mobile edit Mobile web edit Advanced mobile edit Next edit →
Line 35: SIMT is intended to limit [[instruction fetching]] overhead,<ref>{{cite conference \|first1=Sean \|last1=Rul \|first2=Hans \|last2=Vandierendonck \|first3=Joris \|last3=D’Haene \|first4=Koen \|last4=De Bosschere \|title=An experimental study on performance portability of OpenCL kernels \|year=2010 \|conference=Symp. Application Accelerators in High Performance Computing (SAAHPC)\|hdl=1854/LU-1016024 \|hdl-access=free }}</ref> i.e. the latency that comes with memory access, and is used in modern GPUs (such as those of [[Nvidia\|NVIDIA]] and [[AMD]]) in combination with 'latency hiding' to enable high-performance execution despite considerable latency in memory-access operations. This{{Which\|date=February 2025}} is where the processor is oversubscribed with computation tasks, and is able to quickly switch between tasks when it would otherwise have to wait on memory. This strategy is comparable to [[Hyperthreading\|hyperthreading in CPUs]].<ref>{{cite web \|url=http://www.cc.gatech.edu/~vetter/keeneland/tutorial-2011-04-14/12-advanced_topics_in_cuda.pdf \|title=Advanced Topics in CUDA \|date=2011 \|website=cc.gatech.edu \|access-date=2014-08-28}}</ref> As with SIMD, another major benefit is the sharing of the control logic by many data lanes, leading to an increase in computational density. One block of control logic can manage N data lanes, instead of replicating the control logic N times. A downside of SIMT execution is the fact that ~~thread-specific control-flow~~"masking" is ~~performed~~the ~~using~~only way to control ~~"masking"~~execution, leading to poor utilization ~~where~~in acomplex ~~processor's threads follow different control-flow paths~~algorithms. For instance, to handle an ''IF''-''ELSE'' block where various threads of a processor execute different paths, all threads must actually process both paths (as all threads of a processor always execute in lock-step), but masking is used to disable and enable the various threads as appropriate. Masking is avoided when control flow is coherent for the threads of a processor, i.e. they all follow the same path of execution. The masking strategy is what distinguishes SIMT from ordinary SIMD, and has the benefit of inexpensive synchronization between the threads of a processor.<ref>{{cite book \|author1=Michael McCool \|author2=James Reinders \|author3=Arch Robison \|title=Structured Parallel Programming: Patterns for Efficient Computation \|publisher=Elsevier \|year=2013 \|pages=209 ff}}</ref> {\| class="wikitable" style="text-align: center"

Single instruction, multiple threads: Difference between revisions