Content deleted Content added
Tag: Reverted |
Tags: Mobile edit Mobile web edit Advanced mobile edit |
||
(7 intermediate revisions by 7 users not shown) | |||
Line 1:
{{Short description|Instruction pipeline}}
{{Use American English|date = March 2019}}
{{
In the [[history of computing hardware|history of computer hardware]], some early [[reduced instruction set computer]] [[central processing unit]]s (RISC CPUs) used a very similar architectural solution, now called a '''classic RISC pipeline'''. Those CPUs were: [[MIPS architecture|MIPS]], [[SPARC]], Motorola [[Motorola 88000|88000]], and later the notional CPU [[DLX]] invented for education.
Line 10:
[[Image:Fivestagespipeline.png|thumb|400px|Basic five-stage pipeline in a [[RISC]] machine (IF = [[Instruction fetch|Instruction Fetch]], ID = Instruction Decode, EX = Execute, MEM = Memory access, WB = Register write back). The vertical axis is successive instructions; the horizontal axis is time. So in the green column, the earliest instruction is in WB stage, and the latest instruction is undergoing instruction fetch.]]
===Instruction fetch===
The instructions reside in memory that takes one cycle to read. This memory can be dedicated to SRAM, or an Instruction [[Cache (computing)|Cache]]. The term "latency" is used in computer science often and means the time from when an operation starts until it completes. Thus, instruction fetch has a latency of one [[clock cycle]] (if using single-cycle SRAM or if the instruction was in the cache). Thus, during the [[Instruction fetch|Instruction Fetch]] stage, a 32-bit instruction is fetched from the instruction memory.
The [[
===Instruction decode===
Line 29:
The Execute stage is where the actual computation occurs. Typically this stage consists of an ALU, and also a bit shifter. It may also include a multiple cycle multiplier and divider.
The ALU is responsible for performing
The bit shifter is responsible for shift and rotations.
Line 37:
* Register-Register Operation (Single-cycle latency): Add, subtract, compare, and logical operations. During the execute stage, the two arguments were fed to a simple ALU, which generated the result by the end of the execute stage.
* Memory Reference (Two-cycle latency). All loads from memory. During the execute stage, the ALU added the two arguments (a register and a constant offset) to produce a virtual address by the end of the cycle.
*[[Cycles per instruction
===Memory access===
Line 50:
During this stage, both single cycle and two cycle instructions write their results into the register file.
Note that two different stages are accessing the register file at the same
On real silicon, this can be a hazard (see below for more on hazards). That is because one of the source registers being read in decode might be the same as the destination register being written in writeback. When that happens, then the same memory cells in the register file are being both read and written the same time. On silicon, many implementations of memory cells will not operate correctly when read and written at the same time.
Line 130:
* Predict Not Taken: Always fetch the instruction after the branch from the instruction cache, but only execute it if the branch is not taken. If the branch is not taken, the pipeline stays full. If the branch is taken, the instruction is flushed (marked as if it were a NOP), and one cycle's opportunity to finish an instruction is lost.
* Branch Likely: Always fetch the instruction after the branch from the instruction cache, but only execute it if the branch was taken. The compiler can always fill the branch delay slot on such a branch, and since branches are more often taken than not, such branches have a smaller IPC penalty than the previous kind.
* [[Branch delay slot|Branch Delay Slot]]:
* [[Branch Prediction]]: In parallel with fetching each instruction, guess if the instruction is a branch or jump, and if so, guess the target. On the cycle after a branch or jump, fetch the instruction at the guessed target. When the guess is wrong, flush the incorrectly fetched target.
Line 148:
But the programmer, especially if programming in a language supporting [[large numbers|large integers]] (e.g. [[Lisp (programming language)|Lisp]] or [[Scheme (programming language)|Scheme]]), may not want wrapping arithmetic. Some architectures (e.g. MIPS), define special addition operations that branch to special locations on overflow, rather than wrapping the result. Software at the target ___location is responsible for fixing the problem. This special branch is called an exception. Exceptions differ from regular branches in that the target address is not specified by the instruction itself, and the branch decision is dependent on the outcome of the instruction.
The most common kind of software-visible exception on one of the classic RISC machines is a [[
Exceptions are different from branches and jumps, because those other control flow changes are resolved in the decode stage. Exceptions are resolved in the writeback stage. When an exception is detected, the following instructions (earlier in the pipeline) are marked as invalid, and as they flow to the end of the pipe their results are discarded.
To make it easy (and fast) for the software to fix the problem and restart the program, the CPU must take a precise exception. A precise exception means that all instructions up to the excepting instruction have been executed, and the excepting instruction and everything afterwards have not been executed.
Line 169:
== References ==
{{Reflist}}
{{refbegin}}
* {{cite book |first1=John L. |last1=Hennessy |first2=David A. |last2=Patterson |title=Computer Architecture, A Quantitative Approach |publisher=Morgan Kaufmann |edition=5th |year=2011 |isbn=978-0123838728 |url=https://books.google.com/books?id=v3-1hVwHnHwC}}
|