Single instruction, multiple data: Difference between revisions

Content deleted Content added
add see also for SWAR and SIMT
Tags: Mobile edit Mobile web edit Advanced mobile edit
mention SIMD lanes and clarify a little by mentioning SWAR
Tags: Mobile edit Mobile web edit Advanced mobile edit
Line 15:
SIMD has three different subcategories in [[Flynn's taxonomy#Single instruction stream, multiple data streams (SIMD)|Flynn's 1972 Taxonomy]], one of which is [[single instruction, multiple threads]] (SIMT). SIMT should not be confused with [[Thread (computing)|software threads]] or [[Multithreading (computer architecture)|hardware threads]], both of which are task time-sharing (time-slicing). SIMT is true simultaneous parallel hardware-level execution, such as in the [[ILLIAC IV]].
 
One key distinction between SIMT and SIMD is that the SIMD unit will not have its own memory (a SIMT unitsystem could ''use'' a SIMD unit: usually termed [[SIMD lanes]]).
Another key distinction in SIMT is the presence of control flow mechanisms like warps ([[Nvidia]] terminology) or wavefronts (Advanced Micro Devices ([[AMD]]) terminology). [[ILLIAC IV]] simply called them "Control Signals". These allow divergence and convergence of threads, even under shared instruction streams, thereby offering slightly more flexibility than classical [[SIMD within a register]].{{clarify|reason=Is classical SIMD one of the subcategories in Flynn's 1972 paper? If so, which subcategory?|date=July 2025}}
 
Each hardware element (PU, or PE in [[ILLIAC IV]] terminology) working on individual data item sometimes also referred as SIMD lane or channel. Modern [[graphics processing unit]]s (GPUs) are often wide SIMD (typically >16 data lanes or channel) implementations.{{cn|date=July 2024}} Some newer GPUs go beyond simple SIMD and integrate mixed-precision SIMD pipelines, which allow concurrent execution of [[8-bit computing|8-bit]], [[16-bit computing|16-bit]], and [[32-bit computing|32-bit]] operations in different lanes. This is critical for applications like AI inference, where mixed precision boosts throughput.