Cell software development: Difference between revisions

Content deleted Content added
copyedit, clean up
SmackBot (talk | contribs)
m Dated {{Who}}. (Build )
Line 10:
The [[VMX]] (Vector Multimedia Extensions) technology is conceptually similar to the vector model provided by the SPU processors, but there are many significant differences.
 
{| class="wikitable" style="margin: 1em auto 1em auto"
|+ '''VMX to SPU Comparison'''{{ref|vmxrefman}}<!-- 333 pages --><br>''unfinished''
! feature || VMX || SPU
|-
! [[word (computer science)|word]] size
| 32 bits || 32 bits
|-
! number of [[register (computer science)|registers]]
| 32 <!-- p.28/333 --> || 128
<!-- p.34/333 also shows 32 GP and 32 FP regs, are these part of VMX? -->
|-
! register width
| 128-bit quadword <!-- p.28/333 --> || 128-bit quadword
|-
! [[integer]] formats
| 8, 16, 32 <!-- p.26/333 --> || 8, 16, 32, 64 <!-- checked: there is no doubleword add or mul instr. -->
|-
! saturation support
| yes <!-- p.26/333 --> || no <!-- check this -->
|-
! byte ordering
| big (default), little <!--p.44/333 --> || big endian
|-
! [[floating point (computer science)|floating point]] modes
| Java, non-Java || single precision, IEEE double
|-
! [[memory (computer science)|memory]] alignment
| quadword only || quadword only
|}
 
Line 45:
 
====Intrinsics====
Compilers for Cell{{whoWho|date=February 2011}} provide [[intrinsic function|intrinsic]]s to expose useful SPU instructions in C and C++. Instructions that differ only in the type of operand (such as a, ai, ah, ahi, fa, and dfa for addition) are typically represented by a single C/C++ intrinsic which selects the proper instruction based on the type of the operand.
 
====Porting VMX code for SPU====
Line 59:
Transferring data between the local stores of different SPUs can have a large performance cost. The local stores of individual SPUs can be exploited using a variety of strategies.
 
Applications with high locality, such as dense matrix computations, represent an ideal workload class for the local stores in Cell BE.<ref>{{cite web|url=http://www.research.ibm.com/people/m/mikeg/papers/2006_ieeemicro.pdf|format=PDF|title=Synergistic Processing in Cell's Multicore Architecture|date=March 2006}}</ref>
 
Streaming computations can be efficiently accommodated using [[software pipelining]] of memory block transfers using a multi-buffering strategy.<ref name="research.ibm.com"/>