Revision as of 12:42, 25 July 2025 edit Lkcl (talk \| contribs) Extended confirmed users 3,004 edits added screenshot from ARPA-funded "unlimited distribution" document, July 15 1971, just for fits and giggles. the image is also used in an academic paper Tags: Mobile edit Mobile web edit Advanced mobile edit ← Previous edit		Revision as of 12:51, 25 July 2025 edit undo Lkcl (talk \| contribs) Extended confirmed users 3,004 edits mention control signals Tags: Mobile edit Mobile web edit Advanced mobile edit Next edit →
Line 6: [[File:SIMD2.svg\|thumb\|Single instruction, multiple data]] '''Single instruction, multiple data''' ('''SIMD''') is a type of [[parallel computing]] (processing) in [[Flynn's taxonomy]]. SIMD describes computers with [[multiple processing elements]] that perform the same operation on multiple data points simultaneously. SIMD can be internal (part of the hardware design) and it can be directly accessible through an [[instruction set architecture]] (ISA), but it should not be confused with an ISA. Such machines exploit [[Data parallelism\|data level parallelism]], but not [[Concurrent computing\|concurrency]]: there are simultaneous (parallel) computations, but each unit performs exactly the same instruction at any given moment (just with different data). A simple example is to add many pairs of numbers together, all of the SIMD units are performing an addition, but each one has different pairs of values to add. SIMD is especially applicable to common tasks such as adjusting the contrast in a [[digital image]] or adjusting the volume of [[digital audio]]. Most modern [[central processing unit]] (CPU) designs include SIMD instructions to improve the performance of [[multimedia]] use. In recent CPUs, SIMD units are tightly coupled with cache hierarchies and prefetch mechanisms, which minimize latency during large block operations. For instance, AVX-512-enabled processors can prefetch entire cache lines and apply fused multiply-add operations (FMA) in a single SIMD cycle. Line 12: [[Image:ILLIAC_IV.jpg\|thumb\|[[ILLIAC IV]] Array overview, from ARPA-funded Introductory description by Steward Denenberg, July 15 1971.<ref>https://apps.dtic.mil/sti/tr/pdf/ADA954882.pdf</ref>]] SIMD has three different subcategories in [[Flynn's taxonomy#Single instruction stream, multiple data streams (SIMD)\|Flynn's 1972 Taxonomy]], one of which is [[single instruction, multiple threads]] (SIMT). SIMT should not be confused with [[Thread (computing)\|software threads]] or [[Multithreading (computer architecture)\|hardware threads]], both of which are task time-sharing (time-slicing). SIMT is true simultaneous parallel hardware-level execution, such as in the [[ILLIAC IV]]. One Akey distinction between SIMT and SIMD is that the SIMD unit will not have its own memory (a SIMT unit could ''use'' a SIMD unit). Another key distinction in SIMT is the presence of control flow mechanisms like warps ([[Nvidia]] terminology) or wavefronts (Advanced Micro Devices ([[AMD]]) terminology). [[ILLIAC IV]] simply called them "Control Signals". These allow divergence and convergence of threads, even under shared instruction streams, thereby offering slightly more flexibility than classical SIMD.{{clarify\|reason=Is classical SIMD one of the subcategories in Flynn's 1972 paper? If so, which subcategory?\|date=July 2025}} Each hardware element (PU, or PE in [[ILLIAC IV]] terminology) working on individual data item sometimes also referred as SIMD lane or channel. Modern [[graphics processing unit]]s (GPUs) are often wide SIMD (typically >16 data lanes or channel) implementations.{{cn\|date=July 2024}} Some newer GPUs go beyond simple SIMD and integrate mixed-precision SIMD pipelines, which allow concurrent execution of [[8-bit computing\|8-bit]], [[16-bit computing\|16-bit]], and [[32-bit computing\|32-bit]] operations in different lanes. This is critical for applications like AI inference, where mixed precision boosts throughput.

Single instruction, multiple data: Difference between revisions