Content deleted Content added
m →Wide adoption: ndash |
m →Wide adoption: dot |
||
Line 28:
The first [[Superscalar processor|superscalar]] [[Microprocessor|single-chip processors]] ([[Intel i960]]CA in 1989) used a simple scoreboarding scheduling like the CDC 6600 had quarter of a century earlier, but in 1992–1996 a rapid advancement of techniques, enabled by [[Moore's law|increasing transistor counts]], saw proliferation down to personal computers. [[Motorola 88110]] (1992) used a history buffer to revert instructions.<ref>{{cite journal |last1=Ullah |first1=Nasr |last2=Holle |first2=Matt |title=The MC88110 Implementation of Precise Exceptions in a Superscalar Architecture |journal=ACM Sigarch Computer Architecture News |url=https://dl.acm.org/doi/pdf/10.1145/152479.152482 |publisher=Motorola Inc. |format=pdf |date=March 1993|volume=21 |pages=15–25 |doi=10.1145/152479.152482 |s2cid=7036627 }}</ref> Loads could be executed ahead of preceding stores. While stores and branches were waiting to start execution, subsequent instructions of other types could keep flowing through all the pipeline stages, including writeback. The 12-entry capacity of the history buffer placed a limit on the reorder distance.<ref>{{cite web |last1=Smotherman |first1=Mark |title=Motorola MC88110 Overview |url=http://www.m88k.com/orig/misc/msmotherman-88110.txt |date=29 April 1994}}</ref><ref>{{cite journal |last1=Diefendorff |first1=Keith |author1-link=Keith Diefendorff |last2=Allen |first2=Michael |title=Organization of the Motorola 88110 superscalar RISC microprocessor |journal=IEEE Micro |date=April 1992 |volume=12 |issue=2 |pages=40–63 |doi=10.1109/40.127582 |s2cid=25668727 |url=http://cjat.ir/images/PDF_English/20143.pdf |archive-url=https://web.archive.org/web/20221021015941/http://cjat.ir/images/PDF_English/20143.pdf |archive-date=2022-10-21 }}</ref><ref>{{cite book |last1=Smotherman |first1=Mark |last2=Chawla |first2=Shuchi |last3=Cox |first3=Stan |last4=Malloy |first4=Brian |title=Proceedings of the 26th Annual International Symposium on Microarchitecture |chapter=Instruction scheduling for the Motorola 88110 |date=December 1993 |pages=257–262 |doi=10.1109/MICRO.1993.282761 |isbn=0-8186-5280-2 |s2cid=52806289 |chapter-url=https://dl.acm.org/doi/epdf/10.5555/255235.255299}}</ref> [[PowerPC_600#PowerPC_601|PowerPC 601]] (1993) was an evolution of the [[RISC Single Chip]], itself a simplification of POWER1. The 601 permitted branch and floating-point instructions to overtake the integer instructions already in the fetched-instruction-queue, the lowest four entries of which were scanned for dispatchability. In the case of a cache miss, loads and stores could be reordered. Only the link and count register could be renamed.{{Refn|<ref>{{cite web |title=PowerPC™ 601 RISC Microprocessor Technical Summary |url=https://www.nxp.com/docs/en/data-sheet/MPC601.pdf |access-date=23 October 2022}}</ref><ref>[[Charles R. Moore (computer engineer)|Moore, Charles R.]]; Becker, Michael C. et al. {{cite journal |title=The PowerPC 601 microprocessor |journal=IEEE Micro |date=September 1993 |volume=13 |issue=5 |url=https://www.researchgate.net/publication/3214696}}</ref><ref>{{cite web |last1=Diefendorff |first1=Keith |author1-link=Keith Diefendorff |title=PowerPC 601 Microprocessor |url=https://old.hotchips.org/wp-content/uploads/hc_archives/hc05/3_Tue/HC05.S8/HC05.8.2-Diefendorff-Motorola-PowerPC601.pdf |publisher=[[Hot Chips]] |date=August 1993}}</ref><ref>{{cite journal |last1=Smith |first1=James E. |last2=Weiss |first2=Shlomo |author1-link=James E. Smith (engineer) |title=PowerPC 601 and Alpha 21064: A Tale of Two RISCs |journal=IEEE Computer |date=June 1994 |volume=27 |issue=6 |pages=46–58 |doi=10.1109/2.294853 |s2cid=1114841 |url=https://www.eecg.utoronto.ca/~moshovos/ACA05/read/ppc601and21064.pdf}}</ref><ref>{{cite journal |last1=Sima |first1=Dezsö |title=The design space of register renaming techniques |url=https://www.researchgate.net/publication/3215151 |journal=IEEE Micro |date=September–October 2000 |volume=20 |issue=5 |pages=70–83 |doi=10.1109/40.877952 |citeseerx=10.1.1.387.6460 |s2cid=11012472 }}</ref>}} In the fall of 1994 [[NexGen]] and [[AIM alliance|IBM with Motorola]] brought the renaming of general-purpose registers to single-chip CPUs. NexGen's Nx586 was the first [[x86]] processor capable of out-of-order execution, accomplished with [[micro-operation|micro-OPs]]. The reordering distance is up to 14 micro-OPs.<ref>{{cite web |last1=Gwennap |first1=Linley |title=NexGen Enters Market with 66-MHz Nx586 |url=https://www.ardent-tool.com/CPU/docs/MPR/080403.pdf |website=[[Microprocessor Report]] |archive-url=https://web.archive.org/web/20211202223054/https://www.ardent-tool.com/CPU/docs/MPR/080403.pdf |archive-date=2 December 2021 |date=28 March 1994}}</ref> [[PowerPC_600#PowerPC_603|PowerPC 603]] renamed both the general-purpose and FP registers. Each of the four non-branch execution units can have one instruction wait in front of it without blocking the instruction flow to the other units. A five-entry [[reorder buffer]] lets no more than four instructions to overtake an unexecuted instruction. Due to a store buffer, a load can access cache ahead of a preceding store.<ref>{{cite journal |last1=Burgess |first1=Brad |last2=Ullah |first2=Nasr |last3=Van Overen |first3=Peter |last4=Ogden |first4=Deene |title=The PowerPC 603 microprocessor |journal=Communications of the ACM |date=June 1994 |volume=37 |issue=6 |pages=34–42 |doi=10.1145/175208.175212 |s2cid=34385975 |doi-access=free }}</ref><ref>{{cite web |title=PowerPC™ 603 RISC Microprocessor Technical Summary |url=https://www.nxp.com/docs/en/data-sheet/MPC603.pdf |access-date=27 October 2022}}</ref>
[[PowerPC_600#PowerPC_604|PowerPC 604]] (1995) was the first single-chip processor with [[execution unit]]-level reordering, as three out of its six units each had a two-entry reservation station permitting the newer entry to execute before the older. The reorder buffer capacity is 16 instructions. A four-entry load queue and a six-entry store queue track the reordering of loads and stores upon cache misses.<ref>{{cite journal |last1=Song |first1=S. Peter |last2=Denman |first2=Marvin |last3=Chang |first3=Joe |title=The PowerPC 604 RISC microprocessor |journal=IEEE Micro |date=October 1994 |volume=14 |issue=5 |page=8 |doi=10.1109/MM.1994.363071 |s2cid=11603864 |url=https://www.complang.tuwien.ac.at/andi/tuonly/SkriptPPC604.pdf}}</ref> [[HAL SPARC64]] (1995) exceeded the reordering capacity of the [[IBM System/390|ES/9000]] model 900 by having three 8-entry reservation stations for integer, floating-point, and [[address generation unit]], and a 12-entry reservation station for load/store, which permits greater reordering of cache/memory access than preceding processors. Up to 64 instructions can be in a reordered state at a time.<ref>{{cite web |title=SPARC64+: HAL's Second Generation 64-bit SPARC Processor |url=https://old.hotchips.org/wp-content/uploads/hc_archives/hc07/2_Mon/HC7.S3/HC7.3.2.pdf |website=[[Hot Chips]]}}</ref><ref>{{cite web |url=https://www.irisa.fr/caps/projects/TechnologicalSurvey/micro/PI-957-html/section2_8_7.html |website=[[Research Institute of Computer Science and Random Systems]] |title=Le Sparc64 |language=French}}</ref> [[Pentium Pro]] (1995) introduced a ''[[reservation station|unified reservation station]]'', which at the 20 micro-OP capacity permitted very flexible reordering, backed by a 40-entry reorder buffer. Loads can be reordered ahead of both loads and stores.<ref>{{cite web |last1=Gwennap |first1=Linley |title=Intel's P6 Uses Decoupled Superscalar Design |url=http://www.cs.cmu.edu/afs/cs/academic/class/15213-f01/docs/mpr-p6.pdf |website=[[Microprocessor Report]] |date=16 February 1995}}</ref>
The practically attainable [[instructions per cycle|per-cycle rate of execution]] rose more as full out-of-order execution was further adopted by [[Silicon Graphics|SGI]]/[[MIPS Technologies|MIPS]] ([[R10000]]) and [[Hewlett-Packard|HP]] [[PA-RISC]] ([[PA-8000]]) in 1996. The same year [[Cyrix 6x86]] and [[AMD K5]] brought advanced reordering techniques into mainstream [[personal computer]]s. Since [[DEC Alpha]] gained out-of-order execution in 1998 ([[Alpha 21264]]), the top-performing out-of-order processor cores have been unmatched by in-order cores other than [[Hewlett-Packard|HP]]/[[Intel]] [[Itanium]] 2 and [[IBM]] [[POWER6]], though the latter had an out-of-order [[floating-point unit]].<ref>Le, Hung Q. et al. {{cite journal |title=IBM POWER6 microarchitecture |journal=IBM Journal of Research and Development |date=November 2007 |volume=51 |issue=6 |url=https://course.ece.cmu.edu/~ece742/f12/lib/exe/fetch.php?media=le_power6.pdf}}</ref> The other high-end in-order processors fell far behind, namely [[Sun Microsystems|Sun]]'s [[UltraSPARC III]]/[[UltraSPARC IV|IV]], and IBM's [[Mainframe computer|mainframes]] which had lost the out-of-order execution capability for the second time, remaining in-order into the [[IBM z10|z10]] generation. Later big in-order processors were focused on multithreaded performance, but eventually the [[SPARC T series]] and [[Xeon Phi]] changed to out-of-order execution in 2011 and 2016 respectively.
|