Out-of-order execution: Difference between revisions

Content deleted Content added
m Add {{Redirect|OOE||Ooe (disambiguation)}}
start review: ce for clarity. WP:ASTONISH. rm unnec paren.
Line 1:
{{Short description|Computing paradigm to improve computational efficiency}}
{{Redirect|OOE||Ooe (disambiguation)}}
 
In [[computer engineering]], '''out-of-order execution''' (or more formally '''dynamic execution''') is a [[paradigm]] used in most high-performance [[central processing unit]]s to make use of [[instruction cycle]]s that would otherwise be wasted. In this paradigm, a processor executes [[Instruction (computing)|instructions]] in an order governed by the availability of input data and execution units,<ref>{{cite book |author-last=Kukunas |author-first=Jim |date=2015 |title=Power and Performance: Software Analysis and Optimization |url=https://books.google.com/books?id=X-WcBAAAQBAJ&pg=PA37 |publisher=Morgan Kaufman |page=37 |isbn=9780128008140}}</ref> rather than by their original order in a program.<ref>{{cite web |url=http://courses.cs.washington.edu/courses/csep548/06au/lectures/introOOO.pdf |title=Out-of-order execution |date=2006 |quote=don't wait for previous instructions to execute if this instruction does not depend on them |access-date=2014-01-17 |publisher=cs.washington.edu}}</ref><ref name="Regis High School 2011">{{cite web | title=The Centennial Celebration | website=Regis High School | date=2011-03-14 | url=https://www.regis.org/2014/multimedia/tomasulo.cfm | access-date=2022-06-25|quote=The algorithm "allows sequential instructions that would normally be stalled due to certain dependencies to execute non-sequentially" (also known as out of order execution).}}</ref> In doing so, the processor can avoid being idle while waiting for the preceding instruction to complete and can, in the meantime, process the next instructions that are able to run immediately and independently.<ref>{{cite web |url=http://www.pcguide.com/ref/cpu/arch/int/featOOE-c.html |quote=This flexibility improves performance since it allows execution with less 'waiting' time. |title=Out-of-order Execution |publisher=pcguide.com |access-date=2014-01-17}}</ref>
 
== History ==
Out-of-order execution is a restricted form of [[dataflow architecture|data flow]] computation, which was a major research area in [[computer architecture]] in the 1970s and early 1980s.
 
=== Early use in supercomputers ===
The first machine to use out-of-order execution was the [[CDC 6600]] (1964), designed by [[James E. Thornton]], which uses a [[Scoreboarding|scoreboard]] to avoid conflicts. It permits an instruction to execute if its source operand (read) registers aren't to be written to by any unexecuted earlier instruction (true dependency) and the destination (write) register not be a register used by any unexecuted earlier instruction (false dependency). The 6600 lacks the means to avoid stalling an [[execution unit]] on false dependencies ([[Hazard_(computer_architecture)#Write_after_write_(WAW)|write after write]] (WAW) and [[Hazard_(computer_architecture)#Write_after_read_(WAR)|write after read]] (WAR) conflicts, respectively termed ''first -order conflict'' and ''third -order conflict'' by Thornton, who termed true dependencies ([[Hazard_(computer_architecture)#Read_after_write_(RAW)|read after write]] (RAW)) as second -order conflict) because each address has only a single ___location referable by it. The WAW is worse than WAR for the 6600, because when an execution unit encounters a WAR, the other execution units still receive and execute instructions, but upon a WAW the assignment of instructions to execution units stops, and they can not receive any further instructions until the WAW-causing instruction's destination register has been written to by earlier instruction.<ref>{{harvtxt|Thornton|1970|p=125-127}}</ref>
 
About two years later, the [[IBM System/360 Model 91]] (1966) introduced [[register renaming]] with [[Tomasulo's algorithm]],<ref>{{citation |title=An Efficient Algorithm for Exploiting Multiple Arithmetic Units |journal=[[IBM Journal of Research and Development]] |volume=11 |issue=1 |pages=25–33 |date=1967 |author-first=Robert Marco |author-last=Tomasulo |author-link=Robert Marco Tomasulo |doi=10.1147/rd.111.0025 |url=https://pdfs.semanticscholar.org/8299/94a1340e5ecdb7fb24dad2332ccf8de0bb8b.pdf |archive-url=https://web.archive.org/web/20180612141530/https://pdfs.semanticscholar.org/8299/94a1340e5ecdb7fb24dad2332ccf8de0bb8b.pdf |url-status=dead |archive-date=2018-06-12 |citeseerx=10.1.1.639.7540|s2cid=8445049 }}</ref> which dissolves false dependencies (WAW and WAR), making full out-of-order execution possible. An instruction addressing a write into a register ''r<sub>n</sub>'' can be executed before an earlier instruction using the register ''r<sub>n</sub>'' is executed, by actually writing into an alternative (renamed) register ''alt-r<sub>n</sub>'', which is turned into a normal register ''r<sub>n</sub>'' only when all the earlier instructions addressing ''r<sub>n</sub>'' have been executed, but until then ''r<sub>n</sub>'' is given for earlier instructions and ''alt-r<sub>n</sub>'' for later ones addressing ''r<sub>n</sub>''.
 
About two years later, the [[IBM System/360 Model 91]] (1966) introduced [[register renaming]] with [[Tomasulo's algorithm]],<ref>{{citation |title=An Efficient Algorithm for Exploiting Multiple Arithmetic Units |journal=[[IBM Journal of Research and Development]] |volume=11 |issue=1 |pages=25–33 |date=1967 |author-first=Robert Marco |author-last=Tomasulo |author-link=Robert Marco Tomasulo |doi=10.1147/rd.111.0025 |url=https://pdfs.semanticscholar.org/8299/94a1340e5ecdb7fb24dad2332ccf8de0bb8b.pdf |archive-url=https://web.archive.org/web/20180612141530/https://pdfs.semanticscholar.org/8299/94a1340e5ecdb7fb24dad2332ccf8de0bb8b.pdf |url-status=dead |archive-date=2018-06-12 |citeseerx=10.1.1.639.7540|s2cid=8445049 }}</ref> which dissolves false dependencies (WAW and WAR), making full out-of-order execution possible. An instruction addressing a write into a register ''r<sub>n</sub>'' can be executed before an earlier instruction using the register ''r<sub>n</sub>'' is executed, by actually writing into an alternative (renamed) register ''alt-r<sub>n</sub>'', which is turned into a normal ("architectural") register ''r<sub>n</sub>'' only when all the earlier instructions addressing ''r<sub>n</sub>'' have been executed, but until then ''r<sub>n</sub>'' is given for earlier instructions and ''alt-r<sub>n</sub>'' for later ones addressing ''r<sub>n</sub>''. In the Model 91 the register renaming is implemented by a [[Operand forwarding|bypass]] termed ''Common Data Bus'' (CDB) and memory source operand buffers, leaving the physical architectural registers unused for many cycles as the oldest state of registers addressed by any unexecuted instruction is found on the CDB. Another advantage the Model 91 has over the 6600 is the ability to execute instructions out-of-order the instructions ''atin the same [[execution unit]]'', not just between the units like the 6600. This is accomplished by [[reservation station]]s, from which instructions go to the execution unit when ready, as opposed to the FIFO queue of each execution unit of the 6600. The Model 91 is also capable of re-ordering loads and stores to execute before the preceding loads and stores,<ref name=zs1/> unlike the 6600, which only has a limited ability to move loads past loads, and stores past stores, but not loads past stores and stores past loads.<ref>{{harvtxt|Thornton|1970|p=48-50}}</ref> Only the floating-point registers of the Model 91 are renamed, making it subject to the same WAW and WAR limitations as the CDC 6600 when running fixed-point codecalculations. The 91 and 6600 both also suffer from [[Tomasulo's algorithm#Exceptions|imprecise exceptions]], which needed to be solved before out-of-order execution could be applied generally and made practical outside supercomputers.<!--[[User:Kvng/RTH]]-->
 
=== Precise exceptions ===