Cell software development: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 23:17, 13 December 2006 edit CmdrObot (talk \| contribs) 339,230 edits m sp (2): boundarys→boundaries, et al.→''et al.'' ← Previous edit		Latest revision as of 09:52, 11 June 2025 edit undo BoxerJon (talk \| contribs) 17 edits No edit summary Tag: Visual edit
(63 intermediate revisions by 49 users not shown)
Line 1: '''Software development''' for the [[~~cell~~Cell microprocessor]] ~~involve~~involves a mixture of conventional development practices for the [[~~IBM POWER\|POWER architecture~~PowerPC]]-compatible PPU core, and novel software development challenges with ~~regards~~regard to the functionally reduced SPU coprocessors. ▼ ~~{\| class="messagebox" id="inuse" style="border: 1px solid #5A8261; background-color: #CCFFCC;"~~ \|-▼ ~~\|align="center" width="1%"\|[[Image:Crystal 128 clock.png\|center\|30px]]~~ \|align="center" width="60%"\|'''This sub-article is under construction.<br><small>''A recent, fully-intact main article can be viewed at [http://en.wikipedia.org/w/index.php?title=Cell_microprocessor&oldid=57781754 oldid=57781754].</small>'' ~~Contributions are welcome at [[cell microprocessor]].<br>If you wish to help with the creation of this subarticle, please see [[Talk:cell microprocessor]] to avoid creating edit conflicts.~~ \|}▼ ▲'''Software development''' for the [[cell microprocessor]] involve a mixture of conventional development practices for the [[IBM POWER\|POWER architecture]]-compatible PPU core, and novel software development challenges with regards to the functionally reduced SPU coprocessors. ~~==Cell SDK==~~ {{Cell microprocessor segments}}▼ ~~===Full system simulator===~~ ~~===GNU compiler toolchain===~~ ~~===IBM XL C/C++===~~ ~~===IBM Octopiler===~~ ====References====▼ ==Linux on Cell== * [http://news.zdnet.com/2100-9593_22-6042132.html Octopiler seeks to arm Cell programmers] <!-- Correction: This article misstated the nature of the processor core in IBM's Cell. The processor core uses the same instruction set as the PowerPC 970, therefore letting it run the same software. The core is a fellow member of IBM's PowerPC AS family, but is not a PowerPC 970. --> An open source software-based strategy was adopted to accelerate the development of a Cell BE ecosystem and to provide an environment to develop Cell applications, including a GCC-based Cell compiler, binutils and a port of the Linux operating system.<ref name="research.ibm.com">{{cite web\|url=http://www.research.ibm.com/people/m/mikeg/papers/2007_ieeecomputer.pdf\|title=An Open Source Environment for Cell Broadband Engine System Software\|date=June 2007}}</ref> * International Symposium on Code Generation and Optimization (CGO'06) ==~~Linux on cell~~Octopiler== '''Octopiler''' is [[IBM]]'s prototype [[compiler]] to allow [[software developer]]s to write [[software code\|code]] for [[Cell processor]]s.<ref>{{citation\|title=Using advanced compiler technology to exploit the performance of the Cell Broadband Engine architecture\|date=2017-10-23\|url=http://www.research.ibm.com/journal/sj/451/eichenberger.html\|archive-url=https://web.archive.org/web/20060411094457/http://www.research.ibm.com/journal/sj/451/eichenberger.html\|archive-date=2006-04-11\|url-status=dead\|publisher=IBM Systems Journal}}</ref><ref>{{Cite web\|date=2006-01-20\|title=Compiler Technology for Scalable Architectures\|url=https://www.empat.tech/services\|archive-url=https://web.archive.org/web/20080320071448/http://domino.research.ibm.com/comm/research_projects.nsf/pages/cellcompiler.index.html\|archive-date=2008-03-20\|access-date=2025-06-11\|website=IBM Research\|language=en-us}}</ref><ref>{{Cite web\|last=Stokes\|first=Jon\|date=2006-02-26\|title=IBM's Octopiler, or, why the PS3 is running late\|url=https://arstechnica.com/uncategorized/2006/02/6265-2/\|access-date=2025-06-11\|website=Ars Technica}}</ref> ==Software portability== ===Adapting VMX for SPU=== ====Differences between VMX and SPU==== The [[AltiVec\|VMX]] (Vector Multimedia Extensions) technology is conceptually similar to the vector model provided by the SPU processors, but there are many significant differences. {\| class="wikitable" style="margin: 1em auto 1em auto" \|+ '''VMX to SPU Comparison'''{{ref\|vmxrefman}}<!-- 333 pages --><br>''unfinished'' ! feature \|\| VMX \|\| SPU \|- ▼ ~~! word size~~ \| 32 bits \|\| 32 bits ▼ \|- ! number of registers ▼ \| 32 <!-- p.28/333 --> \|\| 128 ▼ <!-- p.34/333 also shows 32 GP and 32 FP regs, are these part of VMX? --> ▼ \|- ! [[Word (data type)\|word]] size ! register width ▼ ▲\| 32 bits \|\| 32 bits ~~\| 128 bit quadword <!-- p.28/333 --> \|\| 128 bit quadword~~ \|- ▲! number of ~~registers~~[[Processor register\|registers]] ! integer formats ▼ \| ~~8, 16,~~ 32 <!-- p.2628/333 --> \|\| ~~8, 16, 32 <!-- checked: there is no doubleword add or mul instr. -->~~128 ▲<!-- p.34/333 also shows 32 GP and 32 FP regs, are these part of VMX? --> \|- ▲! register width ! saturation support ▼ \| ~~yes~~128-bit quadword <!-- p.2628/333 --> \|\| ~~no <!~~128-~~- check this -->~~bit quadword \|- ! byte ordering ▼ \| big (default), little <!--p.44/333 --> \|\| big endian ▼ \|- ▲! [[integer]] formats ! floating point modes ▼ \| 8, 16, 32 <!-- p.26/333 --> \|\| 8, 16, 32, 64 <!-- checked: there is no doubleword add or mul instr. --> ▲\|- ▲! saturation support ▲\| 32yes <!-- p.2826/333 --> \|\| ~~128~~no <!-- check this --> ▲\|}- ▲! byte ordering ▲\| big (default), little <!--p.44/333 --> \|\| big endian ▲\|- ▲! [[Floating-point arithmetic\|floating point]] modes \| Java, non-Java \|\| single precision, IEEE double \|- ! ~~memory~~[[Data structure alignment\|Memory alignment]] \| quadword only \|\| quadword only \|} The VMX ''[[Java (programming language)\|Java]] mode'' conforms to the [[Java Language Specification]] 1 subset of the default [[IEEE ~~standard~~Standard]], extended to include IEEE and [[C99\|C9X]] compliance where the Java standard falls silent. ~~''Non~~In a typical implementation, non-Java mode'' ~~might~~converts or[[denormal]] ~~might~~values ~~not~~to bezero ~~faster,~~but ~~might~~Java ormode ~~might~~traps ~~not~~into bean ~~non-compliant~~emulator when the processor encounters such a value. The IBM ''PPE Vector/SIMD manual'' does not define operations for double -precision floating point, though IBM has published material implying certain double -precision performance numbers associated with the Cell PPE VMX technology.▼ ~~Quadword (ie Four times a 32 bit word or 128 bits) alignment is on 16 Byte (128 bit) boundaries (ie the low four address bits are zero).~~ ▲The IBM ''PPE Vector/SIMD manual'' does not define operations for double precision floating point, though IBM has published material implying certain double precision performance numbers associated with the Cell PPE VMX technology. ====Intrinsics==== Compilers for Cell{{Who\|date=February 2011}} provide [[intrinsic function\|intrinsic]]s to expose useful SPU instructions in C and C++. Instructions that differ only in the type of operand (such as a, ai, ah, ahi, fa, and dfa for addition) are typically represented by a single C/C++ intrinsic which selects the proper instruction based on the type of the operand. ====Porting VMX code for SPU==== ~~There~~A issubstantial aamount ~~great~~of ~~body~~VMX of([[Altivec]]) code ~~which has been~~ developed for ~~other~~ [[IBM Power ~~processors that could potentially be adapted and recompiled to run on the SPU. This code base includes VMX code that~~microprocessors]], ~~runs~~particularly under the [[PowerPC]] version of ~~Apple's~~[[macOS]], OScan X,potentially ~~where~~be itadapted isfor ~~better~~use ~~known~~on asthe ~~Altivec~~SPU. ~~Depending~~The onfeasibility ~~how~~of ~~many~~porting ~~VMX~~depends ~~specific features are involved,~~on the ~~adaptation~~extent ~~involved~~of ~~can~~VMX-specific ~~range~~features ~~anywhere~~used—ranging from straightforward, to ~~onerous, to completely~~ impractical. ~~The most~~However, ~~important~~key workloads ~~for~~typically ~~the~~map ~~SPU~~well ~~generally~~to ~~map~~the ~~quite~~SPU ~~well~~architecture. In some cases ~~it is possible to port~~, existing VMX code can be ported directly. If the VMX code is highly generic (makes few assumptions about the execution environment) the translation can be relatively straightforward. The two processors specify a different [[binary format\|binary code format]], so recompilation is required at a minimum. Even where [[Instruction (computer science)\|instructions]] exist with the same ~~behaviours~~behaviors, they do not have the same instruction names, so this must be mapped as well. IBM's ~~provides~~development toolkit includes compiler intrinsics ~~which~~that ~~take~~automate ~~care~~much of this mapping ~~transparently as part of the development toolkit~~. In many cases, however, a directly equivalent instruction does not exist. The workaround might be obvious or it might not. For example, if saturation ~~behaviour~~behavior is required on the SPU, it can be coded by adding additional SPU instructions to accomplish this (with some loss of efficiency). At the other extreme, if Java floating -point semantics are required, this is almost impossible to achieve on the SPU processor. To achieve the same computation on the SPU might require that an entirely different [[algorithm ~~which needs to~~]] be written from scratch. The most important conceptual similarity between VMX and the SPU architecture is supporting the same [[vectorization model]]. For this reason, ~~mosts~~most algorithms ~~successfully~~ adapted to Altivec will usually adapt successfully to the SPU architecture as well. ==Local store exploitation== ~~==Compiler-mediated parallelism==~~ Transferring data between the local stores of different SPUs can have a large performance cost. The local stores of individual SPUs can be exploited using a variety of strategies. Applications with high locality, such as dense matrix computations, represent an ideal workload class for the local stores in Cell BE.<ref>{{cite web\|url=http://www.research.ibm.com/people/m/mikeg/papers/2006_ieeemicro.pdf\|title=Synergistic Processing in Cell's Multicore Architecture\|date=March 2006}}</ref> ~~===References===~~ Streaming computations can be efficiently accommodated using [[software pipelining]] of memory block transfers using a multi-buffering strategy.<ref name="research.ibm.com"/> The software cache offers a solution for random accesses.<ref>{{cite web\|url=http://www.research.ibm.com/journal/sj/451/eichenberger.pdf\|title=Using advanced compiler technology to exploit the performance of the Cell Broadband Engine architecture\|date=January 2006}}</ref> More sophisticated applications can use multiple strategies for different data types.<ref>{{cite web\|url=http://www.research.ibm.com/cell/papers/2008_vee_cellgc.pdf\|title=Cell GC: Using the Cell Synergistic Processor as a Garbage Collection Coprocessor \|date=March 2008}}</ref> ▲====References==== * [http://www.research.ibm.com/cell/ The Cell Project at IBM Research] * [http://cag.csail.mit.edu/crg/papers/eichenberger05cell.pdf Optimizing Compiler for a CELL Processor] * [~~http~~https://~~www~~dx.~~research~~doi.~~ibm~~org/10.~~com/journal~~1147/sj/.451~~/eichenberger~~.~~html~~0059 Using advanced compiler technology to exploit the performance of the Cell Broadband ~~Engine™~~Engine architecture] <!-- repackaging of Eichberger ''et al.'' above --> * [http://domino.research.ibm.com/comm/research_projects.nsf/pages/cellcompiler.index.html Compiler Technology for Scalable Architectures] {{reflist}} ▲{{Cell microprocessor segments}} {{DEFAULTSORT:Cell Software Development}} [[Category:Cell BE architecture]] [[Category:Compilers]] [[Category:Vaporware]]