Revision as of 08:34, 17 February 2011 edit Quuxplusone (talk \| contribs) Extended confirmed users 12,132 edits copyedit, clean up ← Previous edit		Revision as of 08:43, 17 February 2011 edit undo SmackBot (talk \| contribs) 3,734,324 edits m Dated {{Who}}. (Build ) Next edit →
Line 10: The [[VMX]] (Vector Multimedia Extensions) technology is conceptually similar to the vector model provided by the SPU processors, but there are many significant differences. {\| class="wikitable" style="margin: 1em auto 1em auto" \|+ '''VMX to SPU Comparison'''{{ref\|vmxrefman}}<!-- 333 pages --><br>''unfinished'' ! feature \|\| VMX \|\| SPU \|- ! [[word (computer science)\|word]] size \| 32 bits \|\| 32 bits \|- ! number of [[register (computer science)\|registers]] \| 32 <!-- p.28/333 --> \|\| 128 <!-- p.34/333 also shows 32 GP and 32 FP regs, are these part of VMX? --> \|- ! register width \| 128-bit quadword <!-- p.28/333 --> \|\| 128-bit quadword \|- ! [[integer]] formats \| 8, 16, 32 <!-- p.26/333 --> \|\| 8, 16, 32, 64 <!-- checked: there is no doubleword add or mul instr. --> \|- ! saturation support \| yes <!-- p.26/333 --> \|\| no <!-- check this --> \|- ! byte ordering \| big (default), little <!--p.44/333 --> \|\| big endian \|- ! [[floating point (computer science)\|floating point]] modes \| Java, non-Java \|\| single precision, IEEE double \|- ! [[memory (computer science)\|memory]] alignment \| quadword only \|\| quadword only \|} Line 45: ====Intrinsics==== Compilers for Cell{{~~who~~Who\|date=February 2011}} provide [[intrinsic function\|intrinsic]]s to expose useful SPU instructions in C and C++. Instructions that differ only in the type of operand (such as a, ai, ah, ahi, fa, and dfa for addition) are typically represented by a single C/C++ intrinsic which selects the proper instruction based on the type of the operand. ====Porting VMX code for SPU==== Line 59: Transferring data between the local stores of different SPUs can have a large performance cost. The local stores of individual SPUs can be exploited using a variety of strategies. Applications with high locality, such as dense matrix computations, represent an ideal workload class for the local stores in Cell BE.<ref>{{cite web\|url=http://www.research.ibm.com/people/m/mikeg/papers/2006_ieeemicro.pdf\|format=PDF\|title=Synergistic Processing in Cell's Multicore Architecture\|date=March 2006}}</ref> Streaming computations can be efficiently accommodated using [[software pipelining]] of memory block transfers using a multi-buffering strategy.<ref name="research.ibm.com"/>

Cell software development: Difference between revisions