Power Processing Element: Difference between revisions

Content deleted Content added
cat
m Removing link(s) Wikipedia:Articles for deletion/IBM Developer closed as soft delete (XFDcloser)
 
(31 intermediate revisions by 16 users not shown)
Line 1:
{{Short description|In microprocessor architecture}}
{{Power Architecture}}
{{Redirect|Power Processing Unit|the electrical circuit device|Power processing unit}}
{{More citations needed|date=April 2020}}
{{POWER, PowerPC, and Power ISA}}
{{Infobox CPU
| name = Power Processing Element
| image = Cell-Processor.jpg
| image_size = frameless|upright=1.25
| caption = The 90 nm Cell BE processor. The PPE is the upper fourth of the processor.
| produced-start = 2005
| produced-end = Present
| slowest = 2.8 | slow-unit = GHz
| fastest = 3.2 | fast-unit = GHz
| size-from = 90 nm
Line 14 ⟶ 17:
| designfirm = [[IBM]]
| manuf1 = [[IBM]]
| arch = [[PowerPowerPC Architecture2.02]]
| microarch = PPU
| code =
Line 24 ⟶ 27:
| application = [[Video game console|Gaming Console]], [[High Performance Computing|HPC]]
| predecessor =
| successor = [[IBM A2]]
| variant = [[Cell (microprocessor)|Cell BE]], [[Xenon (processor)|XCPU]], [[Xenon (processor)#XCGPU|XCGPU]], [[Cell processor#PowerXCell 8i|PowerXCell 8i]]
}}
The '''Power Processing Element''' ('''PPE''') comprises a '''Power Processing Unit''' ('''PPU''') and a 512 KB L2 cache. In most instances the PPU is used in a PPE. The PPU is a [[64-bit]] [[Multithreadingmultithreading (computer architecture)|dual -threaded]] [[Out-of-order execution|in-order]] [[PowerPowerPC Architecture2.02]] [[microprocessor]] [[Multimulti-core processor|core]] designed by [[IBM]] for use primarily in the [[game console]]s [[PlaystationPlayStation 3]] and [[Xbox 360]], but has also found applications in high performance computing in [[supercomputer]]s such as the record setting [[IBM Roadrunner]].
 
In most instances the PPU is joined by a 512 KB L2 cache to form what is called the Power Processing Element (PPE).
 
The PPU is used as a main CPU core in three different processor designs:
* The [[Cell (microprocessor)|Cell Broadband Engine]] (Cell BE) which is used primarily in [[Sony]]'s [[PlaystationPlayStation 3]] gaming console. It uses the PPE and comes in three versions, a 90 nm, a 65 nm and a 45 nm part.
* The [[Cell (microprocessor)#PowerXCell 8i|PowerXCell 8i]] which is a version of the Cell BE with enhanced FPU and memory subsystem. It was only manufactured as a singesingle 65 nm version.
* The [[Xenon (processor)|XCPU]] which is used in a three -core configuration and a unified 1 MB L2 cache inside Microsoft's [[Xbox 360]]. It comes in three versions, the 90 nm and 65 nm versions, and the 45 nm [[Xenon (processor)#XCGPU|XCGPU]] with an integrated [[Graphicsgraphics processing unit|graphics processor]] from [[ATI Technologies|ATI]].
 
== Main features ==
 
* 64-bit, dual-threaded core
* Typical 3.2 GHz typical clockrate
* 32 KB [[CPU cache|L1 Instructioninstruction cache]]
* 32 KB [[CPU cache|L1 Datadata cache]]
* 512 KB Unifiedunified L2 cache, [[Set-associative#Associativity|8-way set associative]] in the PPE variant.
* Compatible with 64-bit PowerPC ISA v.2.02 ([[POWER4]] and [[PowerPC 970]])<ref name="the-ppe">{{cite book |url= https://link.springer.com/chapter/10.1007/978-1-4419-0308-2_2 |title=Practical Computing on the Cell Broadband Engine |first=Sandeep |last=Koranne |chapter=The Power Processing Element (PPE) |chapter-url=https://link.springer.com/chapter/10.1007/978-1-4419-0308-2_2 |isbn=978-1-4419-0307-5 |publisher=[[Springer Science+Business Media]] |date=July 15, 2009|pages=17–34 |doi=10.1007/978-1-4419-0308-2_2 }}</ref>{{rp|page=17}}
* Compatible with 64-bit PowerPC ISA v.2.02 ([[POWER4]] and [[PowerPC 970]])
* [[AltiVec]] [[SIMD]] functionality
 
== Execution units ==
 
* [[Branch predictor|Branch Unit (BRU)]]
* [[Arithmetic logic unit|Fixed Point Integer Unit (FXU)]]
* [[Load-storeLoad–store unit (computing)|Load and Store Unit (LSU)]]
* [[Floating-point unit|Floating-Point Unit (FPU)]]
* [[AltiVec|Vector Media Extension Unit (VMX)]]
 
== In-Orderorder ==
{{mainMain | Out-of-order execution}}
The PPU is an Inin-Orderorder processor, but it has some unique traits which allow it to achieve some benefits of Outout-of-Orderorder execution without expensive re-ordering hardware. Upon reaching an L1 cache miss - it can execute past the cache miss, stopping only when an instruction is actually dependent on a load. It can send up to 8 load instructions to the L2 cache out-of-order. It has an instruction delay pipe - a side path that allows it to execute instructions that would normally cause [[Bubblebubble (computing)|pipeline stalls]] without holding up the rest of the [[Instructioninstruction pipeline|pipeline]]. The instruction delay pipeline is used for the Out-Of-Order Load/Stores: cache misses are put there while it moves on.
 
== The PPE's Pipelinepipeline ==
The PPE has a 23 -stage general pipeline with an additional 11 stages possible for Microcodemicrocode and an additional 4 stages possible for Branchbranch Predictionprediction. <ref>[{{cite web |url=http://www.ibm.com/developerworks/library/pa-cellperf/ |title=Cell Broadband Engine Architecture and its first implementation] |first1=Thomas |last1=Chen |first2=Ram |last2=Raghavan |first3=Jason |last3=Dale |first4=Eiji |last4=Iwata |website=IBM DeveloperWorks |archive-url=https://web.archive.org/web/20151208051244/http://www.ibm.com/developerworks/library/pa-cellperf/ |archive-date=2015-12-08 |url-status=dead}}</ref>
 
== Multithreading ==
The PPU runs two [[Thread_(computing)simultaneous multithreading|hardware threads]] simultaneously. The [[Processor register|main registers]] for code execution are duplicated, as are the exception and interrupt-handling registers, and several essential arrays and queues. They can generate exceptions simultaneously, and perform branch prediction on their individual branch histories. The execution engine and caches are not duplicated though - so it is still just a single-core design.<ref>[http://www.springer.com/cda/content/document/cda_downloaddocument/9781441903075-c1.pdf Chapter 2 name="the-ppe" The Power Processing Element (PPE)]</ref>
{{main | Simultaneous multithreading}}
The PPU runs two [[Thread_(computing)|hardware threads]] simultaneously. The [[Processor register|main registers]] for code execution are duplicated, as are the exception and interrupt-handling registers, and several essential arrays and queues. They can generate exceptions simultaneously, and perform branch prediction on their individual branch histories. The execution engine and caches are not duplicated though - so it is still just a single-core design.<ref>[http://www.springer.com/cda/content/document/cda_downloaddocument/9781441903075-c1.pdf Chapter 2 - The Power Processing Element (PPE)]</ref>
 
== Floating-point Point Capacitycapacity ==
Its [[64-bit]] [[Doubledouble-precision floating-point format|double -precision]] floating-point unit, and [[128-bit]] VMX unit (using the [[AltiVec]] instruction set), can perform a theoretical 12 floating-point operations per cycle, as all Power Architectureits floating-point unitsunit can do floating-point multiply-adds, and come no smaller than 64-bits. That gives 3.2 billion clock cycles *× 12 = 38.4 billion floating-point operations/second.
 
The PPU is enhanced in the [[Cell processor#PowerXCell 8i|PowerXCell 8i]] processor to be able to make single cycle [[Double-precision floating-point format|double precision floating point]] operations, tailored for high performance computing in supercomputers.
 
The VMX unit in the [[Xenon (processor)|XCPU]] in the Xbox 360 is enhanced with 128 [[Processor register|registers]] and is not entirely compatible with regular AltiVec.
Line 74 ⟶ 72:
==References==
{{Reflist}}
 
{{Cell microprocessor segments}}
 
[[Category:Cell BE architecture]]
[[Category:IBM microprocessors]]
[[Category:PowerPC implementationsarchitecture]]
[[Category:Xbox 360 hardware]]