Single instruction, multiple data: Difference between revisions

Content deleted Content added
Confusion between SIMT and SIMD: refer to subsection for ILLIAC IV Predication
Tags: Mobile edit Mobile web edit Advanced mobile edit
m Removing link(s) Wikipedia:Articles for deletion/Permute instruction closed as soft delete (XFDcloser)
 
(3 intermediate revisions by 2 users not shown)
Line 1:
{{Short description|Type of parallel processing}}
{{Redirect|SIMD|the cryptographic hash function|SIMD (hash function)|the Scottish statistical tool|Scottish index of multiple deprivation}}
{{Update|inaccurate=yes|date=March 2017}}
{{Flynn's Taxonomy}}
{{See also|SIMD within a register|Single instruction, multiple threads}}
{{Flynn's Taxonomy}}
{{Update|inaccurate=yes|date=March 2017}}
 
[[File:SIMD2.svg|thumb|Single instruction, multiple data]]
Line 14:
{{See also|SIMD within a register|Single instruction, multiple threads|Vector processor}}
 
[[Image:ILLIAC_IV.jpg|thumb|[[ILLIAC IV]] Array overview, from ARPA-funded Introductory description by Steward Denenberg, July 15 1971.<ref>{{Cite web | title=Archived copy | url=https://apps.dtic.mil/sti/tr/pdf/ADA954882.pdf | archive-url=https://web.archive.org/web/20240427173522/https://apps.dtic.mil/sti/tr/pdf/ADA954882.pdf | archive-date=2024-04-27}}</ref>]]
 
SIMD has three different subcategories in [[Flynn's taxonomy#Single instruction stream, multiple data streams (SIMD)|Flynn's 1972 Taxonomy]], one of which is [[single instruction, multiple threads]] (SIMT). SIMT should not be confused with [[Thread (computing)|software threads]] or [[Multithreading (computer architecture)|hardware threads]], both of which are task time-sharing (time-slicing). SIMT is true simultaneous parallel hardware-level execution, such as in the [[ILLIAC IV]].
Line 24:
Another key distinction in SIMT is the presence of control flow mechanisms like warps ([[Nvidia]] terminology) or wavefronts (Advanced Micro Devices ([[AMD]]) terminology). [[ILLIAC IV]] simply called them "Control Signals". These signals ensure that each Processing Element in the entire parallel array is synchronized in its simultaneous execution of the (one, current) broadcast instruction.
 
Each hardware element (PU, or PE in [[ILLIAC IV]] terminology) working on individual data item sometimes also referred to as a [[SIMD lane]] or channel. The ILLIAC IV PE was a scalar 64-bit unit that could do 2x32-bit [[Predication_(computer_architecture))#SIMD,_SIMT_and_vector_predication|predication]]. Modern [[graphics processing unit]]s (GPUs) are invariably wide [[SIMD within a register]] (SWAR) and typically have more that 16 data lanes or channels of such Processing Elements.{{cn|date=July 2024}} Some newer GPUs integrate mixed-precision {{cn|date=July 2025}} SWAR pipelines, which performs concurrent sub-word [[8-bit computing|8-bit]], [[16-bit computing|16-bit]], and [[32-bit computing|32-bit]] operations. This is critical for applications like AI inference, where mixed precision boosts throughput.
 
==History==
Line 54:
* Programming with given SIMD instruction sets can involve many low-level challenges.
*# SIMD may have restrictions on [[Data structure alignment|data alignment]]; programmers familiar with a given architecture may not expect this. Worse: the alignment may change from one revision or "compatible" processor to another.
*# Gathering data into SIMD registers and scattering it to the correct destination locations is tricky (sometimes requiring [[permute instruction]]sinstructions (operations) and can be inefficient.
*# Specific instructions like rotations or three-operand addition are not available in some SIMD instruction sets.
*# Instruction sets are architecture-specific: some processors lack SIMD instructions entirely, so programmers must provide non-vectorized implementations (or different vectorized implementations) for them.