Revision as of 09:41, 3 July 2019 edit Frap (talk \| contribs) Extended confirmed users, File movers, Pending changes reviewers, Rollbackers 35,596 edits No edit summary ← Previous edit		Revision as of 21:25, 13 December 2019 edit undo Citation bot (talk \| contribs) Bots 5,869,879 edits m Alter: pages. \| You can use this bot yourself. Report bugs here.\| Activated by User:Nemo bis \| via #UCB_webform Next edit →
Line 16: SIMT is intended to limit [[instruction fetching]] overhead,<ref>{{cite conference \|first1=Sean \|last1=Rul \|first2=Hans \|last2=Vandierendonck \|first3=Joris \|last3=D’Haene \|first4=Koen \|last4=De Bosschere \|title=An experimental study on performance portability of OpenCL kernels \|year=2010 \|conference=Symp. Application Accelerators in High Performance Computing (SAAHPC)}}</ref> i.e. the latency that comes with memory access, and is used in modern GPUs (such as those of [[Nvidia]] and [[AMD]]) in combination with 'latency hiding' to enable high-performance execution despite considerable latency in memory-access operations. This is where the processor is oversubscribed with computation tasks, and is able to quickly switch between tasks when it would otherwise have to wait on memory. This strategy is comparable to [[Multithreading (computer architecture)\|multithreading in CPUs]] (not to be confused with [[Multi-core processor\|multi-core]]).<ref>{{cite web \|url=http://www.cc.gatech.edu/~vetter/keeneland/tutorial-2011-04-14/12-advanced_topics_in_cuda.pdf \|title=Advanced Topics in CUDA \|date=2011 \|website=cc.gatech.edu \|accessdate=2014-08-28}}</ref> A downside of SIMT execution is the fact that thread-specific control-flow is performed using "masking", leading to poor utilization where a processor's threads follow different control-flow paths. For instance, to handle an ''IF''-''ELSE'' block where various threads of a processor execute different paths, all threads must actually process both paths (as all threads of a processor always execute in lock-step), but masking is used to disable and enable the various threads as appropriate. Masking is avoided when control flow is coherent for the threads of a processor, i.e. they all follow the same path of execution. The masking strategy is what distinguishes SIMT from ordinary SIMD, and has the benefit of inexpensive synchronization between the threads of a processor.<ref>{{cite book \|author1=Michael McCool \|author2=James Reinders \|author3=Arch Robison \|title=Structured Parallel Programming: Patterns for Efficient Computation \|publisher=Elsevier \|year=2013 \|pages=209 ff.}}</ref> {\| class="wikitable" style="style="font-size:80%; text-align: center" ! Nvidia [[CUDA]] \|\| [[OpenCL]] \|\| Hennessy & Patterson<ref>{{cite book \|author1=John L. Hennessy \|author2=David A. Patterson\|title=Computer Architecture: A Quantitative Approach \|publisher=Morgan Kaufmann \|edition=6 \|pages=314 ff.}}</ref> \|- \| Thread \|\| Work-item \|\| Sequence of SIMD Lane operations

Single instruction, multiple threads: Difference between revisions