Revision as of 13:16, 25 July 2025 edit Lkcl (talk \| contribs) Extended confirmed users 3,004 edits add see also for SWAR and SIMT Tags: Mobile edit Mobile web edit Advanced mobile edit ← Previous edit		Revision as of 11:03, 26 July 2025 edit undo Lkcl (talk \| contribs) Extended confirmed users 3,004 edits mention SIMD lanes and clarify a little by mentioning SWAR Tags: Mobile edit Mobile web edit Advanced mobile edit Next edit →
Line 15: SIMD has three different subcategories in [[Flynn's taxonomy#Single instruction stream, multiple data streams (SIMD)\|Flynn's 1972 Taxonomy]], one of which is [[single instruction, multiple threads]] (SIMT). SIMT should not be confused with [[Thread (computing)\|software threads]] or [[Multithreading (computer architecture)\|hardware threads]], both of which are task time-sharing (time-slicing). SIMT is true simultaneous parallel hardware-level execution, such as in the [[ILLIAC IV]]. One key distinction between SIMT and SIMD is that the SIMD unit will not have its own memory (a SIMT ~~unit~~system could ''use'' a SIMD unit: usually termed [[SIMD lanes]]). Another key distinction in SIMT is the presence of control flow mechanisms like warps ([[Nvidia]] terminology) or wavefronts (Advanced Micro Devices ([[AMD]]) terminology). [[ILLIAC IV]] simply called them "Control Signals". These allow divergence and convergence of threads, even under shared instruction streams, thereby offering slightly more flexibility than classical [[SIMD within a register]].{{clarify\|reason=Is classical SIMD one of the subcategories in Flynn's 1972 paper? If so, which subcategory?\|date=July 2025}} Each hardware element (PU, or PE in [[ILLIAC IV]] terminology) working on individual data item sometimes also referred as SIMD lane or channel. Modern [[graphics processing unit]]s (GPUs) are often wide SIMD (typically >16 data lanes or channel) implementations.{{cn\|date=July 2024}} Some newer GPUs go beyond simple SIMD and integrate mixed-precision SIMD pipelines, which allow concurrent execution of [[8-bit computing\|8-bit]], [[16-bit computing\|16-bit]], and [[32-bit computing\|32-bit]] operations in different lanes. This is critical for applications like AI inference, where mixed precision boosts throughput.

Single instruction, multiple data: Difference between revisions