Single instruction, multiple data: Difference between revisions

Content deleted Content added
Chronology: highlight that ILLIAC IV was made up of scalar 64-bit PEs not SIMD PEs
Tags: Mobile edit Mobile web edit Advanced mobile edit
bit of clarification, people will conflate SWAR with lanes with Vector Processing etc...
Tags: Mobile edit Mobile web edit Advanced mobile edit
Line 18:
Another key distinction in SIMT is the presence of control flow mechanisms like warps ([[Nvidia]] terminology) or wavefronts (Advanced Micro Devices ([[AMD]]) terminology). [[ILLIAC IV]] simply called them "Control Signals". These allow divergence and convergence of threads, even under shared instruction streams, thereby offering slightly more flexibility than classical [[SIMD within a register]].{{clarify|reason=Is classical SIMD one of the subcategories in Flynn's 1972 paper? If so, which subcategory?|date=July 2025}}
 
Each hardware element (PU, or PE in [[ILLIAC IV]] terminology) working on individual data item sometimes also referred to as a [[SIMD lane]] or channel, although the ILLIAC IV PE was a scalar 64-bit unit. Modern [[graphics processing unit]]s (GPUs) are ofteninvariably wide [[SIMD within a register]] (SWAR) and typically >have more that 16 data lanes or channel)channels of such Processing implementationsElements.{{cn|date=July 2024}} Some newer GPUs go beyond simple SIMD and integrate mixed-precision SIMD{{cn|date=July 2025}} SWAR pipelines, which allowperforms concurrent execution ofsub-word [[8-bit computing|8-bit]], [[16-bit computing|16-bit]], and [[32-bit computing|32-bit]] operations in different lanes. This is critical for applications like AI inference, where mixed precision boosts throughput.
 
SIMD should not be confused with [[Vector processing]].
Additionally, SIMD can exist in both fixed and scalable vector forms. Fixed-width SIMD units operate on a constant number of data points per instruction, while scalable designs, like RISC-V Vector or ARM's SVE, allow the number of data elements to vary depending on the hardware implementation. This improves forward compatibility across generations of processors.
 
==History==