Content deleted Content added
→Description: efficiency not limited to one ISA Tags: Mobile edit Mobile web edit Advanced mobile edit |
→Vector instructions: Fujitsu VP series did actually make the mistake of having only 1 load/store unit Tags: Mobile edit Mobile web edit Advanced mobile edit |
||
Line 188:
Broadcom included space in all vector operations of the [[Videocore]] IV ISA for a {{code|REP}} field, but unlike the STAR-100 which uses memory for its repeats, the Videocore IV repeats are on all operations including arithmetic vector operations. The repeat length can be a small range of [[power of two]] or sourced from one of the scalar registers.<ref>[https://github.com/hermanhermitage/videocoreiv/wiki/VideoCore-IV-Programmers-Manual Videocore IV Programmer's Manual]</ref>
The [[Cray-1]] introduced the idea of using [[processor register]]s to hold vector data in batches. The batch lengths (vector length, VL) could be dynamically set with a special instruction, the significance compared to Videocore IV (and, crucially as will be shown below, SIMD as well) being that the repeat length does not have to be part of the instruction encoding. This way, significantly more work can be done in each batch; the instruction encoding is much more elegant and compact as well. The only drawback is that in order to take full advantage of this extra batch processing capacity, the memory load and store speed correspondingly had to increase as well. This is sometimes claimed{{By whom|date=November 2021}} to be a disadvantage of Cray-style vector processors, and the [[Fujitsu_VP#Issues_with_the_design|Fujitsu VP series]] did make this mistake: in reality it is part of achieving high performance throughput, as seen in [[GPU]]s, which face exactly the same issue.
Modern SIMD computers claim to improve on early Cray by directly using multiple ALUs, for a higher degree of parallelism compared to only using the normal scalar pipeline. Modern vector processors (such as the [[SX-Aurora TSUBASA]]) combine both, by issuing multiple data to multiple internal pipelined SIMD ALUs, the number issued being dynamically chosen by the vector program at runtime. Masks can be used to selectively load and store data in memory locations, and use those same masks to selectively disable processing element of SIMD ALUs. Some processors with SIMD ([[AVX-512]], ARM [[Scalable Vector Extension|SVE2]]) are capable of this kind of selective, per-element ([[Predication (computer architecture)|"predicated"]]) processing, and it is these which somewhat deserve the nomenclature "vector processor" or at least deserve the claim of being capable of "vector processing". SIMD processors without per-element predication
|