Revision as of 13:04, 30 July 2025 edit Kvng (talk \| contribs) Extended confirmed users, New page reviewers 115,948 edits m Array processor redirects here ← Previous edit		Revision as of 18:36, 30 July 2025 edit undo Rofraja (talk \| contribs) Extended confirmed users, Pending changes reviewers 47,363 edits m Replaced 8 bare URLs by {{Cite web}} Next edit →
Line 5: '''Single instruction, multiple threads''' ('''SIMT''') is an execution model used in [[parallel computing]] where a single central "Control Unit" broadcasts an instruction to multiple "Processing Units" for them to all ''optionally'' perform simultaneous synchronous and fully-independent parallel execution of that one instruction. Each PU has its own independent data and address registers, its own independent Memory, but no PU in the array a [[Program counter]]. In [[Flynn's taxonomy\|Flynn's 1972 taxonomy]] this arrangement is a variation of [[SIMD]] termed an '''array processor'''. [[Image:ILLIAC_IV.jpg\|thumb\|[[ILLIAC IV]] Array overview, from ARPA-funded Introductory description by Steward Denenberg, July 15 1971.<ref>{{Cite web \| title=Archived copy \| url=https://apps.dtic.mil/sti/tr/pdf/ADA954882.pdf ~~{{Bare~~\| ~~URL~~archive-url=https://web.archive.org/web/20240427173522/https://apps.dtic.mil/sti/tr/pdf/ADA954882.pdf ~~PDF~~\| archive-date=~~July 2025~~2024-04-27}}</ref>]] The SIMT execution model has been implemented on several [[GPU]]s and is relevant for [[general-purpose computing on graphics processing units]] (GPGPU), e.g. some [[supercomputer]]s combine CPUs with GPUs. In the [[ILLIAC IV]] the CPU was a [[Burroughs_Large_Systems#B6500,_B6700/B7700,_and_successors\|Burroughs B6500]]. Line 14: The key difference between SIMT and [[SIMD lanes]] is that each of the Processing Units in the SIMT Array have their own local memory, and may have a completely different Stack Pointer (and thus perform computations on completely different data sets), whereas the ALUs in SIMD lanes know nothing about memory per se, and have no [[register file]]. This is illustrated by the [[ILLIAC IV]]. Each SIMT core was termed a Processing Element, and each PE had its own separate Memory. Each PE had an "Index register" which was an address into its PEM.<ref>https://www.researchgate.net/publication/2992993_The_Illiac_IV_system {{Bare URL inline\|date=July 2025}}</ref><ref>{{Cite web \| title=Archived copy \| url=https://apps.dtic.mil/sti/tr/pdf/ADA954882.pdf ~~{{Bare~~\| ~~URL~~archive-url=https://web.archive.org/web/20240427173522/https://apps.dtic.mil/sti/tr/pdf/ADA954882.pdf ~~PDF~~\| archive-date=~~July 2025~~2024-04-27}}</ref> In the [[ILLIAC IV]] the Burroughs B6500 primarily handled I/O, but also sent instructions to the Control Unit (CU) which would then handle the broadcasting to the PEs. Additionally the B6500, in its role as an I/O processor, had access to ''all'' PEMs. Additionally, each PE may be made active or inactive. If a given PE is inactive it will not execute the instruction broadcast to it by the Control Unit: instead it will sit idle until activated. Each PE can be said to be [[Predication_(computer_architecture)#SIMD,_SIMT_and_Vector_Predication\|Predicated]]. The SIMT execution model is still only a way to present to the programmer what is fundamentally still a Predicated SIMD concept. Programs must be designed with Predicated SIMD in mind. With Instruction Issue (as a synchronous broadcast) being handled by the single Control Unit, SIMT cannot ''by design'' allow threads (PEs, Lanes) to diverge by branching, because only the Control Unit has a Program Counter. If possible, therefore, branching is to be avoided.<ref>{{Cite web \| title=SIMT Model - Open Source General-Purpose Computing Chip Platform - Blue Porcelain(GPGPU) \| url=https://gpgpuarch.org/en/basic/simt/ ~~{{Bare URL inline~~\| access-date=~~July~~ 2025-07-30 \| website=gpgpuarch.org}}</ref> <ref>{{Cite web \| title=General-Purpose Graphics Processor Architecture - Chapter 3 - The SIMT Core: Instruction and Register Data Flow (Part 1) {{!}} FANnotes \| url=https://www.fannotes.me/article/gpgpu_architecture/chapter_3_the_simt_core_instruction_and_register_data_flow_part_1 ~~{{Bare URL inline~~\| access-date=~~July~~ 2025-07-30 \| website=www.fannotes.me}}</ref> Also important to note is the difference between SIMT and [[SPMD]] - Single Program Multiple Data. SPMD, like standard multi-core systems, has multiple Program Counters, where SIMT only has one: in the (one) Control Unit. Line 26: ==History== In [[Flynn's taxonomy]], Flynn's original papers cite two historic examples of SIMT processors termed "Array Processors": the [[ILLIAC IV#SOLOMON\|SOLOMON]] and [[ILLIAC IV]].<ref>{{Cite web \| title=Archived copy \| url=https://apps.dtic.mil/sti/tr/pdf/ADA954882.pdf ~~{{Bare~~\| ~~URL~~archive-url=https://web.archive.org/web/20240427173522/https://apps.dtic.mil/sti/tr/pdf/ADA954882.pdf ~~PDF~~\| archive-date=~~July 2025~~2024-04-27}}</ref> SIMT was introduced by [[Nvidia\|NVIDIA]] in the [[Tesla (microarchitecture)\|Tesla GPU microarchitecture]] with the G80 chip.<ref>{{cite web \|url=http://www.nvidia.com/content/PDF/fermi_white_papers/NVIDIA_Fermi_Compute_Architecture_Whitepaper.pdf \|title=NVIDIA Fermi Compute Architecture Whitepaper \|date=2009 \|website=www.nvidia.com \|publisher=NVIDIA Corporation \|access-date=2014-07-17}}</ref><ref name=teslaPaper>{{cite journal \|title=NVIDIA Tesla: A Unified Graphics and Computing Architecture \|date=2008 \|page=6 {{subscription required\|s}} \|doi=10.1109/MM.2008.31 \|volume=28 \|issue=2 \|journal=IEEE Micro\|last1=Lindholm \|first1=Erik \|last2=Nickolls \|first2=John \|last3=Oberman \|first3=Stuart \|last4=Montrym \|first4=John \|s2cid=2793450 }}</ref> [[ATI Technologies]], now [[Advanced Micro Devices\|AMD]], released a competing product slightly later on May 14, 2007, the [[TeraScale (microarchitecture)#TeraScale 1\|TeraScale 1]]-based ''"R600"'' GPU chip. Line 56: === MIAOW GPU === [[File:MIAOW_GPU_diagram.png\|thumb\|MIAOW GPU and associated Computation Unit block diagram.<ref>{{Cite web \| title=Architecture · VerticalResearchGroup/miaow Wiki · GitHub \| url=https://github.com/VerticalResearchGroup/miaow/wiki/Architecture ~~{{Bare URL inline~~\| access-date=~~July~~ 2025-07-30 \| website=github.com}}</ref> ]] The MIAOW Project by the Vertical Research Group is an implementation of AMDGPU "Southern Islands".<ref>{{Cite web \| title=Vertical Research Group {{!}} Main / Projects \| url=https://research.cs.wisc.edu/vertical/wiki/index.php/Main/Projects#miaow ~~{{Bare URL inline~~\| access-date=~~July~~ 2025-07-30 \| website=research.cs.wisc.edu}}</ref> An overview of the internal architecture and design goals was presented at Hotchips.<ref>{{Cite web \| title=Archived copy \| url=https://pages.cs.wisc.edu/~vinay/pubs/MIAOW-HotChips.pdf ~~{{Bare~~\| ~~URL~~archive-url=https://web.archive.org/web/20240416170322/https://pages.cs.wisc.edu/~vinay/pubs/MIAOW-HotChips.pdf ~~PDF~~\| archive-date=~~July 2025~~2024-04-16}}</ref> === GPU Simulator === Line 68: [[File:Vortex microarchitecture.png\|thumb\|Vortex SIMT GPU Microarchitecture diagram]] The Vortex GPU is an Open Source [[GPGPU]] project by [[Georgia Tech University]] that runs [[OpenCL]]. Technical details:<ref>{{Cite web \| title=vortex/docs/microarchitecture.md at master · vortexgpgpu/vortex · GitHub \| url=https://github.com/vortexgpgpu/vortex/blob/master/docs/microarchitecture.md ~~{{Bare~~\| ~~URL inline\|~~access-date=~~July~~ 2025-07-30 \| website=github.com}}</ref> Note a key defining characteristics of SIMT: the ''PC is shared''. However note also that time-multiplexing is used, giving the impression that it has more Array Processing Elements than there actually are.

Single instruction, multiple threads: Difference between revisions