Out-of-order execution: Difference between revisions

Content deleted Content added
WinterWarp (talk | contribs)
m Basic concept: Fixed typo
Tags: canned edit summary Mobile edit Mobile app edit Android app edit
review: terminology and clarification
Line 20:
 
=== Decoupling ===
Smith also researched how to make different execution units operate more independently of each other and of the memory, front-end, and branching.<ref>{{cite journal |last1=Smith |first1=James E. |author1-link=James E. Smith (engineer) |title=Decoupled Access/Execute Computer Architectures |journal=ACM Transactions on Computer Systems |date=November 1984 |volume=2 |issue=4 |pages=289–308 |doi=10.1145/357401.357403 |s2cid=13903321 |url=https://course.ece.cmu.edu/~ece447/s15/lib/exe/fetch.php?media=p289-smith.pdf}}</ref> He implemented those ideas in the [[Astronautics Corporation of America|Astronautics]] ZS-1 (1988), featuring a decoupling of the integer/load/store pipeline from the floating-point pipeline, allowing inter-pipeline reordering. The ZS-1 was also capable of executing loads ahead of preceding stores. In his 1984 paper he opined that enforcing the precise exceptions only on the integer/memory pipeline should be sufficient for many use cases, as it even permits [[virtual memory]]. Each pipeline had an instruction buffer to decouple it from the instruction decoder, to prevent the stalling of the front end. To further decouple the memory access from execution, each of the two pipelines was associated with two addressable [[FIFO (computing and electronics)|queues]] that effectively performed limited register renaming.<ref name=zs1>{{cite journal |last1=Smith |first1=James E. |author1-link=James E. Smith (engineer) |title=Dynamic Instruction Scheduling and the Astronautics ZS-1 |journal=Computer |url=https://course.ece.cmu.edu/~ece740/f13/lib/exe/fetch.php?media=00030730.pdf |pages=21–35 |doi=10.1109/2.30730 |date=July 1989 |volume=22 |issue=7 |s2cid=329170 }}</ref> A similar decoupled architecture had been used a bit earlier in the Culler 7.<ref>{{cite web |last1=Smotherman |first1=Mark |title=Culler-7 |url=https://people.computing.clemson.edu/~mark/culler.html |website=[[Clemson University]]}}</ref> The ZS-1's ISA, like IBM's subsequent POWER, aided the early execution of branches.<!--[[User:Kvng/RTH]]-->
 
=== Research comes to fruition ===
With the [[POWER1]] (1990), IBM returned to out-of-order execution. It was the first processor to combine register renaming (though again only floating-point registers) with precise exceptions. It uses a ''physical register file'' (i.e. a dynamically remapped file with both uncommitted and committed values) instead of a datafull reorder buffer, but the ability to cancel instructions is needed only in the branch unit, which implements a history buffer (named ''program counter stack'' by IBM) to undo changes to count, link, and condition registers. The reordering capability of even the floating-point instructions is still very limited; due to POWER1's inability to reorder floating-point arithmetic instructions (results became available in-order), their destination registers aren't renamed. POWER1 also doesn't have [[reservation station]]s needed for out-of-order use of athe same execution unit.<ref>{{cite journal |last1=Grohoski |first1=Gregory F. |title=Machine organization of the IBM RISC System/6000 processor |journal=[[IBM Journal of Research and Development]] |date=January 1990 |volume=34 |issue=1 |pages=37–58 |doi=10.1147/rd.341.0037 |archive-url=https://web.archive.org/web/20050109191456/http://www.research.ibm.com/journal/rd/341/ibmrd3401F.pdf|url=http://www.research.ibm.com/journal/rd/341/ibmrd3401F.pdf|archive-date=January 9, 2005}}</ref><ref>{{cite journal |last1=Smith |first1=James E. |last2=Sohi |first2=Gurindar S. |author1-link=James E. Smith (engineer) |title=The Microarchitecture of Superscalar Processors |journal=Proceedings of the IEEE |date=December 1995 |volume=83 |issue=12 |url=https://courses.cs.washington.edu/courses/cse471/01au/ss_cgi.pdf |page=1617|doi=10.1109/5.476078 }}</ref> The next year IBM's [[IBM System/390|ES/9000]] model 900 had register renaming alsoadded for the general-purpose registers. It also has [[reservation station]]s with six entries for the dual integer unit (each cycle, from the six instructions up to two can be selected and then executed) and six entries for the FPU. Other units have simple FIFO queues. The reordering distance is up to 32 instructions.<ref>{{cite journal|url=http://www.research.ibm.com/journal/rd/364/ibmrd3604N.pdf|title=Design of the IBM Enterprise System/9000 high-end processor|first=John S.|last=Liptay|journal=[[IBM Journal of Research and Development]]|volume=36|issue=4|date=July 1992| pages=713–731 | doi=10.1147/rd.364.0713 |archive-url=https://web.archive.org/web/20050117034801/http://www.research.ibm.com/journal/rd/364/ibmrd3604N.pdf|archive-date=January 17, 2005}}</ref> The A19 of [[Unisys]]' [[Burroughs Large Systems|A-series of mainframes]] was also released in 1991 and was claimed to have out-of-order execution, and one analyst called the A19's technology three to five years ahead of the competition.<ref>{{cite news |last1=Ziegler |first1=Bart |title=Unisys Unveils 'Top Gun' Mainframe Computers |url=https://apnews.com/article/fbb84876bd4b60cee52e5c3622ea0d13 |work=AP News |date=March 7, 1991}}</ref><ref>{{cite news |title=Unisys' New Mainframe Leaves Big Blue In The Dust |url=https://www.bloomberg.com/news/articles/1991-03-24/unisys-new-mainframe-leaves-big-blue-in-the-dust |work=Bloomberg |date=March 25, 1991 |quote=The new A19 relies on "super-scalar" techniques from scientific computers to execute many instructions concurrently. The A19 can overlap as many as 140 operations, more than 10 times as many as conventional mainframes can.}}</ref><!--[[User:Kvng/RTH]]-->
 
=== Wide adoption ===