Cell software development: Difference between revisions

Content deleted Content added
m References: fixed a syntax error for a hyperlink in the references section
m gen fixes: (6) set identical unnamed references to use named refs (1), add '<references />' using AWB
Line 6:
|}
 
'''Software development''' for the [[cell microprocessor]] involve a mixture of conventional development practices for the [[IBM POWER|POWER architecture]]-compatible PPU core, and novel software development challenges with regards to the functionally reduced SPU coprocessors.
 
==Cell SDK==
{{Cell microprocessor segments}}
 
Line 16:
===IBM Octopiler===
====References====
<references /><!--added under references heading by script-assisted edit-->
* [http://news.zdnet.com/2100-9593_22-6042132.html Octopiler seeks to arm Cell programmers] <!-- Correction: This article misstated the nature of the processor core in IBM's Cell. The processor core uses the same instruction set as the PowerPC 970, therefore letting it run the same software. The core is a fellow member of IBM's PowerPC AS family, but is not a PowerPC 970. -->
* International Symposium on Code Generation and Optimization (CGO'06)
 
==Linux on cell==
An open source software-based strategy was adopted to accelerate the development of a Cell BE ecosystem and to provide an environment to develop Cell applications, including a GCC-based Cell compiler, binutils and a port of the Linux operating system.<ref name="research.ibm.com">{{cite web|url=http://www.research.ibm.com/people/m/mikeg/papers/2007_ieeecomputer.pdf|format=PDF|title=An Open Source Environment for Cell Broadband Engine System Software|date=[[2007-06]]}}</ref>
 
==Software portability==
===Adapting VMX for SPU===
====Differences between VMX and SPU====
The [[VMX]] technology is conceptually similar to the [[vector model]] provided by the [[SPU processors]], but there are many significant differences.
 
{| class="wikitable" style="margin: 1em auto 1em auto"
Line 69 ⟶ 70:
There is a great body of code which has been developed for other IBM [[Power processors]] that could potentially be adapted and recompiled to run on the SPU. This code base includes VMX code that runs under the [[PowerPC]] version of [[Apple Computer|Apple's]] [[Mac OS X]], where it is better known as [[Altivec]]. Depending on how many VMX specific features are involved, the adaptation involved can range anywhere from straightforward, to onerous, to completely impractical. The most important workloads for the SPU generally map quite well.
 
In some cases it is possible to port existing VMX code directly. If the VMX code is highly generic (makes few assumptions about the execution environment) the translation can be relatively straightforward. The two processors specify a different [[binary format|binary code format]], so recompilation is required at a minimum. Even where [[instructions]] exist with the same behaviours, they do not have the same instruction names, so this must be mapped as well. IBM provides compiler intrinsics which take care of this mapping transparently as part of the development toolkit.
 
In many cases, however, a directly equivalent instruction does not exist. The workaround might be obvious or it might not. For example, if saturation behaviour is required on the SPU, it can be coded by adding additional SPU instructions to accomplish this (with some loss of efficiency). At the other extreme, if Java floating point semantics are required, this is almost impossible to achieve on the SPU processor. To achieve the same [[computation]] on the SPU might require an entirely different [[algorithm]] which needs to be written from scratch.
 
The most important conceptual similarity between VMX and the SPU architecture is supporting the same [[vectorization model]]. For this reason, most algorithms successfully adapted to Altivec will usually adapt successfully to the SPU architecture as well.
 
==Local store exploitation==
Local stores can be exploited using a variety of strategies.
 
Applications with high locality, such as dense matrix computations represent an ideal workload class for the local stores in Cell BE.
<ref>{{cite web|url=http://www.research.ibm.com/people/m/mikeg/papers/2006_ieeemicro.pdf|format=PDF|title=Synergistic Processing in Cell's Multicore Architecture|date=[[2006-03]]}}</ref>
 
Streaming computations can be efficiently accommodated using software-pipelining of memory block transfers using a multi-buffering strategy.<ref>{{cite web|urlname=http://www."research.ibm.com"/people/m/mikeg/papers/2007_ieeecomputer.pdf|title=An Open Source Environment for Cell Broadband Engine System Software|date=[[2007-06]]}}</ref>
 
The software cache offers a solution for random accesses.<ref>{{cite web|url=http://www.research.ibm.com/journal/sj/451/eichenberger.pdf|format=PDF|title=Using advanced compiler technology to exploit the performance of the Cell Broadband Engine architecture|date=[[2006-01]]}}</ref>
 
More sophisticated applications can use multiple strategies for different data types.<ref>{{cite web|url=http://www.research.ibm.com/cell/papers/2008_vee_cellgc.pdf|format=PDF|title=Cell GC: Using the Cell Synergistic Processor as a Garbage Collection Coprocessor |date=[[2008-03]]}}</ref>
 
==Compiler-mediated parallelism==
===References===
<references /><!--added under references heading by script-assisted edit-->
* [http://domino.research.ibm.com/cell/ The Cell Project at IBM Research]
* [http://cag.csail.mit.edu/crg/papers/eichenberger05cell.pdf Optimizing Compiler for a CELL Processor]