Classic RISC pipeline: Difference between revisions

Content deleted Content added
Syntax highlight
consistent use of acronym alu
Line 22:
At the same time the register file is read, instruction issue logic in this stage determines if the pipeline is ready to execute the instruction in this stage. If not, the issue logic causes both the Instruction Fetch stage and the Decode stage to stall. On a stall cycle, the input flip flops do not accept new bits, thus no new calculations take place during that cycle.
 
If the instruction decoded is a branch or jump, the target address of the branch or jump is computed in parallel with reading the register file. The branch condition is computed in the following cycle (after the register file is read), and if the branch is taken or if the instruction is a jump, the PC in the first stage is assigned the branch target, rather than the incremented PC that has been computed. Some architectures made use of the [[Arithmetic logic unit|ALU]] (ALU) in the Execute stage, at the cost of slightly decreased instruction throughput.
 
The decode stage ended up with quite a lot of hardware: MIPS has the possibility of branching if two registers are equal, so a 32-bit-wide AND tree runs in series after the register file read, making a very long critical path through this stage (which means fewer cycles per second). Also, the branch target computation generally required a 16 bit add and a 14 bit incrementer. Resolving the branch in the decode stage made it possible to have just a single-cycle branch mis-predict penalty. Since branches were very often taken (and thus mis-predicted), it was very important to keep this penalty low.
 
===Execute===
The Execute stage is where the actual computation occurs. Typically this stage consists of an Arithmetic and Logic UnitALU, and also a bit shifter. It may also include a multiple cycle multiplier and divider.
 
The Arithmetic and Logic UnitALU is responsible for performing boolean operations (and, or, not, nand, nor, xor, xnor) and also for performing integer addition and subtraction. Besides the result, the ALU typically provides status bits such as whether or not the result was 0, or if an overflow occurred.
 
The bit shifter is responsible for shift and rotations.
Line 35:
Instructions on these simple RISC machines can be divided into three latency classes according to the type of the operation:
 
* Register-Register Operation (Single-cycle latency): Add, subtract, compare, and logical operations. During the execute stage, the two arguments were fed to a simple [[Arithmetic logic unit|ALU]], which generated the result by the end of the execute stage.
* Memory Reference (Two-cycle latency). All loads from memory. During the execute stage, the ALU added the two arguments (a register and a constant offset) to produce a virtual address by the end of the cycle.
*[[Cycles per instruction|Multi-cycle Instructions]] (Many cycle latency). Integer multiply and divide and all [[floating-point]] operations. During the execute stage, the operands to these operations were fed to the multi-cycle multiply/divide unit. The rest of the pipeline was free to continue execution while the multiply/divide unit did its work. To avoid complicating the writeback stage and issue logic, multicycle instruction wrote their results to a separate set of registers.