Out-of-order execution: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 15:17, 26 July 2025 edit Lkcl (talk \| contribs) Extended confirmed users 3,004 edits →Precise exceptions: factual incorrect about 88100. Mitch Alsup rocks. Tags: Mobile edit Mobile web edit Advanced mobile edit ← Previous edit		Latest revision as of 23:23, 11 August 2025 edit undo Folkezoft (talk \| contribs) Extended confirmed users 46,608 edits m →Precise exceptions: Tag Bare URL PDFs using AutoWikiBrowser Tag: AWB
(2 intermediate revisions by 2 users not shown)
Line 18: To have [[precise exception]]s, the proper in-order state of the program's execution must be available upon an exception. By 1985 various approaches were developed as described by [[James E. Smith (engineer)\|James E. Smith]] and Andrew R. Pleszkun.<ref name="smith">{{cite journal \|last1=Smith \|first1=James E. \|last2=Pleszkun \|first2=Andrew R. \|author1-link=James E. Smith (engineer) \|title=Implementation of precise interrupts in pipelined processors \|journal=12th ISCA\|date=June 1985 \|url=https://dl.acm.org/doi/epdf/10.5555/327010.327125}}<br/>(Expanded version published in May 1988 as [https://www.cs.virginia.edu/~evans/greatworks/smith.pdf ''Implementing Precise Interrupts in Pipelined Processors''].)</ref> The [[CDC Cyber 205]] was a precursor, as upon a virtual memory interrupt the entire state of the processor (including the information on the partially executed instructions) is saved into an ''invisible exchange package'', so that it can resume at the same state of execution.<ref>{{cite web \|last1=Moudgill \|first1=Mayan \|last2=Vassiliadis \|first2=Stamatis \|title=On Precise Interrupts \|page=18 \|date=January 1996 \|citeseerx=10.1.1.33.3304 \|url=https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.33.3304&rep=rep1&type=pdf \|archive-url=https://web.archive.org/web/20221013035408/https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.33.3304&rep=rep1&type=pdf \|archive-date=13 October 2022 \|format=pdf}}</ref> However to make all exceptions precise, there has to be a way to cancel the effects of instructions. The CDC Cyber 990 (1984) implements precise interrupts by using a history buffer, which holds the old (overwritten) values of registers that are restored when an exception necessitates the reverting of instructions.<ref name="smith"/> Through simulation, Smith determined that adding a reorder buffer (or history buffer or equivalent) to the [[Cray-1S]] would reduce the performance of executing the first 14 [[Livermore loops]] (unvectorized) by only 3%.<ref name="smith"/> Important academic research in this subject was led by [[Yale Patt]] with his [[HPSm]] simulator.<ref>{{cite book \|url=http://dl.acm.org/citation.cfm?id=17391 \|title=HPSm, a high performance restricted data flow architecture having minimal functionality \|work=ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture \|isbn=978-0-8186-0719-6 \|pages=297–306 \|date=1986 \|access-date=2013-12-06 \|author-first1=W. \|author-last1=Hwu \|author-first2=Yale N. \|author-last2=Patt \|author-link2=Yale Patt \|publisher=[[Association for Computing Machinery\|ACM]]}}</ref> In the 1980s many early [[RISC]] microprocessors, had out-of-order writeback to the registers, invariably resulting in imprecise exceptions. The [[Motorola 88100]] was one of the few early microprocessors that did not suffer from imprecise exceptions asdespite aout-of-order ~~result~~writes, although it did allow both precise and imprecise floating-point exceptions.<ref>http://www.bitsavers.org/components/motorola/88000/MC88100_RISC_Microprocessor_Users_Manual_2ed_1990.pdf {{Bare URL PDF\|date=August 2025}}</ref> Instructions started execution in order, but some (e.g. floating-point) took more cycles to complete execution. However, the single-cycle execution of the most basic instructions greatly reduced the scope of the problem compared to the CDC 6600. === Decoupling === Line 33: The practically attainable [[instructions per cycle\|per-cycle rate of execution]] rose further as full out-of-order execution was further adopted by [[Silicon Graphics\|SGI]]/[[MIPS Technologies\|MIPS]] ([[R10000]]) and [[Hewlett-Packard\|HP]] [[PA-RISC]] ([[PA-8000]]) in 1996. The same year [[Cyrix 6x86]] and [[AMD K5]] brought advanced reordering techniques into mainstream personal computers. Since [[DEC Alpha]] gained out-of-order execution in 1998 ([[Alpha 21264]]), the top-performing out-of-order processor cores have been unmatched by in-order cores other than [[Hewlett-Packard\|HP]]/[[Intel]] [[Itanium 2]] and [[IBM POWER6]], though the latter had an out-of-order [[floating-point unit]].<ref>Le, Hung Q. et al. {{cite journal \|title=IBM POWER6 microarchitecture \|journal=IBM Journal of Research and Development \|date=November 2007 \|volume=51 \|issue=6 \|url=https://course.ece.cmu.edu/~ece742/f12/lib/exe/fetch.php?media=le_power6.pdf}}</ref> The other high-end in-order processors fell far behind, namely [[Sun Microsystems\|Sun]]'s [[UltraSPARC III]]/[[UltraSPARC IV\|IV]], and IBM's [[mainframe]]s which had lost the out-of-order execution capability for the second time, remaining in-order into the [[IBM z10\|z10]] generation. Later big in-order processors were focused on multithreaded performance, but eventually the [[SPARC T series]] and [[Xeon Phi]] changed to out-of-order execution in 2011 and 2016 respectively.{{cn\|reason=No mention of out-of-order execution in either linked article.\|date=July 2024}} Almost all processors for phones and other lower-end applications remained in-order until {{circa\|2010}}. First, [[Qualcomm]]'s [[Scorpion (processor)\|Scorpion]] (reordering distance of 32) shipped in [[Qualcomm Snapdragon\|Snapdragon]],<ref>{{cite web \|last1=Mallia \|first1=Lou \|title=Qualcomm High Performance Processor Core and Platform for Mobile Applications \|url=http://rtcgroup.com/arm/2007/presentations/253%20-%20ARM_DevCon_2007_Snapdragon_FINAL_20071004.pdf \|archive-url=https://web.archive.org/web/20131029193001/http://rtcgroup.com/arm/2007/presentations/253%20-%20ARM_DevCon_2007_Snapdragon_FINAL_20071004.pdf \|archive-date=29 October 2013}}</ref> and a bit later [[Arm (company)\|Arm]]'s [[ARM Cortex-A9\|A9]] succeeded [[ARM Cortex-A8\|A8]]. For low-end [[x86]] [[personal computer]]s in-order [[Bonnell microarchitecture]] in early [[Intel Atom]] processors were first challenged by [[AMD]]'s [[Bobcat microarchitecture]], and in 2013 were succeeded by an out-of-order [[Silvermont microarchitecture]].<ref>{{cite web \|url=http://www.anandtech.com/show/6936/intels-silvermont-architecture-revealed-getting-serious-about-mobile/2 \|archive-url=https://archive.today/20161222023104/http://www.anandtech.com/show/6936/intels-silvermont-architecture-revealed-getting-serious-about-mobile/2 \|url-status=dead \|archive-date=December 22, 2016 \|website=AnandTech \|title=Intel's Silvermont Architecture Revealed: Getting Serious About Mobile \|author=Anand Lal Shimpi \|date=2013-05-06}}</ref> Because the complexity of out-of-order execution precludes achieving the lowest minimum power consumption, cost and size, in-order execution is still prevalent in [[microcontroller]]s and [[embedded system]]s, as well as in phone-class cores such as Arm's [[ARM Cortex-A55\|A55]] and [[ARM Cortex-A510\|A510]] in [[big.LITTLE]] configurations. == Basic concept == Line 83: Is there an actual results queue or are the results written directly into a register file? For the latter, the queueing function is handled by register maps that hold the register renaming information for each instruction in flight. :Early Intel out-of-order processors use a results queue called a [[reorder buffer]],{{efn\|Intel [[P6 (microarchitecture)\|P6]] family microprocessors have both a reorder buffer (ROB) and a [[register renaming\|register alias table]] (RAT). The ROB was motivated mainly by branch misprediction recovery. The Intel [[P6 (microarchitecture)\|P6]] family is among the earliest out-of-order microprocessors but were supplanted by the [[NetBurst]] architecture. Years later, NetBurst proved to be a dead end due to its long pipeline that assumed the possibility of much higher operating frequencies. Materials were not able to match the design's ambitious clock targets due to thermal issues and later designs based on NetBurst, namely Tejas and Jayhawk, were cancelled. Intel reverted to the P6 design as the basis of the [[Intel Core (microarchitecture)\|Core]] and [[Nehalem (microarchitecture)\|Nehalem]] microarchitectures.}} while most later out-of-order processors use register maps.{{efn\|The succeeding [[Sandy Bridge]], [[Ivy Bridge (microarchitecture)\|Ivy Bridge]], and [[Haswell (microarchitecture)\|Haswell]] microarchitectures are a departure from the reordering techniques used in P6 and employ reordering techniques from the [[Alpha 21264\|EV6]] and the [[Pentium 4\|P4]] but with a somewhat shorter pipeline.<ref>{{cite web \|author-last=Kanter \|author-first=David \|date=2010-09-25 \|title=Intel's Sandy Bridge Microarchitecture \|url=http://www.realworldtech.com/sandy-bridge/10/}}</ref><ref name="urlThe Haswell Front End - Intels Haswell Architecture Analyzed: Building a New PC and a New Intel">{{cite web \|url=https://www.anandtech.com/show/6355/intels-haswell-architecture/6 \|archive-url=https://web.archive.org/web/20121007163104/http://www.anandtech.com/show/6355/intels-haswell-architecture/6 \|url-status=dead \|archive-date=October 7, 2012 \|title=The Haswell Front End - Intel's Haswell Architecture Analyzed: Building a New PC and a New Intel }}</ref>}} == See also ==