Content deleted Content added
Obviously AMD doesn't own OpenCL, unlike CUDA |
typos etc.... article is a bit messy still |
||
Line 1:
'''Single instruction, multiple thread''' (SIMT) is an execution model used in [[parallel computing]] where [[single instruction, multiple data]] (SIMD) is combined with [[Thread (computing)#Multithreading|
The processors, say a number {{mvar|p}} of them, seem to execute many more than {{mvar|p}} tasks. This is achieved by each processor having multiple "threads" (or "work-items" or "Sequence of SIMD Lane operations"), which execute in lock-step, and are analogous to [[SIMD lanes]].<ref>{{cite book |author1=Michael McCool |author2=James Reinders |author3=Arch Robison |title=Structured Parallel Programming: Patterns for Efficient Computation |publisher=Elsevier |year=2013 |page=52}}</ref>
The SIMT execution model has been implemented on several
SIMT was introduced by [[Nvidia]]:<ref>{{cite web |url=http://www.nvidia.com/content/PDF/fermi_white_papers/NVIDIA_Fermi_Compute_Architecture_Whitepaper.pdf |title=Nvidia Fermi Compute Architecture Whitepaper |date=2009 |website=http://www.nvidia.com/ |publisher=NVIDIA Corporation |accessdate=2014-07-17}}</ref><ref name=teslaPaper>{{cite web |url=http://dx.doi.org/10.1109/MM.2008.31 |title=NVIDIA Tesla: A Unified Graphics and Computing Architecture |date=2008 |website=http://www.ieee.org/ |publisher=IEEE |accessdate=2014-08-07 |page=6 {{subscription required|s}} }}</ref>
Line 11:
[[ATI Technologies]] (now [[Advanced Micro Devices|AMD]]) released a competing product slightly later on May 14, 2007, the [[TeraScale (microarchitecture)#TeraScale 1|TeraScale 1]]-based ''"R600"'' GPU chip.
As access time of all the widespread [[random-access memory|RAM]] types (e.g. [[DDR SDRAM]], [[GDDR SDRAM]], [[XDR DRAM]], etc.) is still
SIMT is intended to limit [[instruction fetching]] overhead,<ref>{{cite conference |first1=Sean |last1=Rul |first2=Hans |last2=Vandierendonck |first3=Joris |last3=D’Haene |first4=Koen |last4=De Bosschere |title=An experimental study on performance portability of OpenCL kernels |year=2010 |conference=Symp. Application Accelerators in High Performance Computing (SAAHPC)}}</ref> i.e. the latency that comes with memory access, and is used in modern GPUs (including, but not limited to those of [[Nvidia]] and [[AMD]]) in combination with 'latency hiding' to enable high-performance execution despite considerable latency in memory-access operations. This is where the processor is oversubscribed with computation tasks, and is able to quickly switch between tasks when it would otherwise have to wait on memory. This strategy is comparable to [[Multithreading (computer architecture)|multithreading in CPUs]] (not to be confused with [[Multi-core processor|multi-core]]).<ref>{{cite web |url=http://www.cc.gatech.edu/~vetter/keeneland/tutorial-2011-04-14/12-advanced_topics_in_cuda.pdf |title=Advanced Topics in CUDA |date=2011 |website=cc.gatech.edu |accessdate=2014-08-28}}</ref>
|