Single instruction, multiple threads: Difference between revisions

Content deleted Content added
Description: the book is not accessible publicly
Tags: Mobile edit Mobile web edit Advanced mobile edit
shuffle paragraphs around (no change to actual words).
Tags: Mobile edit Mobile web edit Advanced mobile edit
Line 10:
 
The processors, say a number {{mvar|p}} of them, seem to execute many more than {{mvar|p}} tasks. This is achieved by each processor having multiple "threads" (or "work-items" or "Sequence of SIMD Lane operations"), which execute in lock-step, and are analogous to [[SIMD lanes]].<ref>{{cite book |author1=Michael McCool |author2=James Reinders |author3=Arch Robison |title=Structured Parallel Programming: Patterns for Efficient Computation |publisher=Elsevier |year=2013 |page=52}}</ref>
 
The SIMT execution model is still only a way to present to the programmer what is fundamentally still a Predicated SIMD concept. Programs must be designed with Predicated SIMD in mind. With Instruction Issue (as a synchronous broadcast) being handled by the single Control Unit, SIMT cannot ''by design'' allow threads (PEs, Lanes) to diverge by branching, because only the Control Unit has a Program Counter. If possible, therefore, branching is to be avoided.<ref>{{Cite web | title=SIMT Model - Open Source General-Purpose Computing Chip Platform - Blue Porcelain(GPGPU) | url=https://gpgpuarch.org/en/basic/simt/ | access-date=2025-07-30 | website=gpgpuarch.org}}</ref>
<ref>{{Cite web | title=General-Purpose Graphics Processor Architecture - Chapter 3 - The SIMT Core: Instruction and Register Data Flow (Part 1) {{!}} FANnotes | url=https://www.fannotes.me/article/gpgpu_architecture/chapter_3_the_simt_core_instruction_and_register_data_flow_part_1 | access-date=2025-07-30 | website=www.fannotes.me}}</ref>
 
=== Differences from other models ===
 
The simplest way to understand SIMT is to imagine a multi-core ([[Multiple_instruction,_multiple_data|MIMD]]) system, where each core has its own register file, its own [[Arithmetic logic unit|ALUs]] (both SIMD and Scalar) and its own data cache, but that unlike a standard multi-core system which has multiple independent instruction caches and decoders, as well as multiple independent Program Counter registers, the instructions are synchronously '''broadcast''' to all SIMT cores from a '''single''' unit with a single instruction cache and a single instruction decoder which reads instructions using a single Program Counter.
Line 18 ⟶ 23:
 
Additionally, each PE may be made active or inactive. If a given PE is inactive it will not execute the instruction broadcast to it by the Control Unit: instead it will sit idle until activated. Each PE can be said to be [[Predication_(computer_architecture)#SIMD,_SIMT_and_Vector_Predication|Predicated]].
 
The SIMT execution model is still only a way to present to the programmer what is fundamentally still a Predicated SIMD concept. Programs must be designed with Predicated SIMD in mind. With Instruction Issue (as a synchronous broadcast) being handled by the single Control Unit, SIMT cannot ''by design'' allow threads (PEs, Lanes) to diverge by branching, because only the Control Unit has a Program Counter. If possible, therefore, branching is to be avoided.<ref>{{Cite web | title=SIMT Model - Open Source General-Purpose Computing Chip Platform - Blue Porcelain(GPGPU) | url=https://gpgpuarch.org/en/basic/simt/ | access-date=2025-07-30 | website=gpgpuarch.org}}</ref>
<ref>{{Cite web | title=General-Purpose Graphics Processor Architecture - Chapter 3 - The SIMT Core: Instruction and Register Data Flow (Part 1) {{!}} FANnotes | url=https://www.fannotes.me/article/gpgpu_architecture/chapter_3_the_simt_core_instruction_and_register_data_flow_part_1 | access-date=2025-07-30 | website=www.fannotes.me}}</ref>
 
Also important to note is the difference between SIMT and [[SPMD]] - Single Program Multiple Data. SPMD, like standard multi-core systems, has multiple Program Counters, where SIMT only has one: in the (one) Control Unit.