Vector processor: Difference between revisions

Content deleted Content added
Citation bot (talk | contribs)
Add: date, title. Changed bare reference to CS1/2. | Use this bot. Report bugs. | Suggested by Guy Harris | #UCB_webform
m Removing link(s) Wikipedia:Articles for deletion/Permute instruction closed as soft delete (XFDcloser)
 
(One intermediate revision by one other user not shown)
Line 489:
Compared to any SIMD processor claiming to be a vector processor, the order of magnitude reduction in program size is almost shocking. However, this level of elegance at the ISA level has quite a high price tag at the hardware level:
# From the IAXPY example, it can be seen that unlike SIMD processors, which can simplify their internal hardware by avoiding dealing with misaligned memory access, a vector processor cannot get away with such simplification: algorithms are written which inherently rely on Vector Load and Store being successful, regardless of alignment of the start of the vector.
# Whilst from the reduction example it can be seen that, aside from [[permute instruction]]sinstructions, SIMD by definition avoids inter-lane operations entirely (element 0 can only be added to another element 0), vector processors tackle this head-on. What programmers are forced to do in software (using shuffle and other tricks, to swap data into the right "lane") vector processors must do in hardware, automatically.
 
Overall then there is a choice to either have
Line 510:
* '''Masked Operations''' – [[Predication (computer architecture)|predicate masks]] allow parallel if/then/else constructs without resorting to branches. This allows code with conditional statements to be vectorized.
* '''Compress and Expand''' – usually using a bit-mask, data is linearly compressed or expanded (redistributed) based on whether bits in the mask are set or clear, whilst always preserving the sequential order and never duplicating values (unlike Gather-Scatter aka permute). These instructions feature in [[AVX-512#Compress and expand|AVX-512]].
* '''Register Gather, Scatter (aka permute)'''<ref>[https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#vector-register-gather-instructions RVV register gather-scatter instructions]</ref> – a less restrictive more generic variation of the compress/expand theme which instead takes one vector to specify the indices to use to "reorder" another vector. Gather/scatter is more complex to implement than compress/expand, and, being inherently non-sequential, can interfere with [[Chaining (vector processing)|vector chaining]]. Not to be confused with [[Gather-scatter]] Memory Load/Store modes, Gather/scatter vector operations act on the vector registers, and are often termed a [[permute instruction]] instead.
* '''Splat and Extract''' – useful for interaction between scalar and vector, these broadcast a single value across a vector, or extract one item from a vector, respectively.
* '''Iota''' – a very simple and strategically useful instruction which drops sequentially-incrementing immediates into successive elements. Usually starts from zero.
Line 523:
 
* '''Sub-vectors''' – elements may typically contain two, three or four sub-elements (vec2, vec3, vec4) where any given bit of a predicate mask applies to the whole vec2/3/4, not the elements in the sub-vector. Sub-vectors are also introduced in RISC-V RVV (termed "LMUL").<ref>[https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#mapping-for-lmul-1-2 LMUL > 1 in RVV]</ref> Subvectors are a critical integral part of the [[Vulkan]] [[SPIR-V]] spec.
* '''Sub-vector [[Swizzling (computer graphics)|swizzle]]''' – aka "Lane Shuffling" which allows sub-vector inter-element computations without needing extra (costly, wasteful) instructions to move the sub-elements into the correct SIMD "lanes" and also saves predicate mask bits. Effectively an in-flight [[permute instruction|mini-permute]] of the sub-vector, this heavily features in 3D Shader binaries (as high as 20% of all instructions<ref>{{cite web | title=Tom Forsyth - the Lifecycle of an Instruction Set | date=22 August 2020 | url=https://vimeo.com/450406346 }}</ref>) and is sufficiently important as to be part of the Vulkan SPIR-V spec. The Broadcom [[Videocore]] IV uses the terminology "Lane rotate"<ref>[https://patents.google.com/patent/US20110227920 Abandoned US patent US20110227920-0096]</ref> where the rest of the industry uses the term [[Swizzling (computer graphics)|"swizzle"]].<ref>[https://github.com/hermanhermitage/videocoreiv-qpu Videocore IV QPU]</ref>
* '''Transcendentals''' – [[trigonometric]] operations such as [[sine]], [[cosine]] and [[logarithm]] obviously feature much more predominantly in 3D than in many demanding [[High-performance computing|HPC]] workloads. Of interest, however, is that speed is far more important than accuracy in 3D for GPUs, where computation of pixel coordinates simply do not require high precision. The Vulkan specification recognises this and sets surprisingly low accuracy requirements, so that GPU Hardware can reduce power usage. The concept of reducing accuracy where it is simply not needed is explored in the [[MIPS-3D]] extension.
 
Line 550:
* [[SX architecture]]
* [[Duncan's taxonomy#Pipelined vector processors|Duncan's taxonomy]] on pipelined vector processors
* [[GPGPU]]
* {{Annotated link|General-purpose computing on graphics processing units (hardware)|GPGPU hardware}}
* [[Compute kernel]]
* [[Stream processing]]