Classic RISC pipeline: Difference between revisions

Content deleted Content added
clean up; add {{reflist}}
Tags: Mobile edit Mobile web edit Advanced mobile edit
 
(4 intermediate revisions by 4 users not shown)
Line 13:
The instructions reside in memory that takes one cycle to read. This memory can be dedicated to SRAM, or an Instruction [[Cache (computing)|Cache]]. The term "latency" is used in computer science often and means the time from when an operation starts until it completes. Thus, instruction fetch has a latency of one [[clock cycle]] (if using single-cycle SRAM or if the instruction was in the cache). Thus, during the [[Instruction fetch|Instruction Fetch]] stage, a 32-bit instruction is fetched from the instruction memory.
 
The [[Programprogram Countercounter]], or (PC) is a register that holds the address that is presented to the instruction memory. The address is presented to instruction memory at the start of a cycle. Then during the cycle, the instruction is read out of instruction memory, and at the same time, a calculation is done to determine the next PC. The next PC is calculated by incrementing the PC by 4, and by choosing whether to take that as the next PC or to take the result of a branch/jump calculation as the next PC. Note that in classic RISC, all instructions have the same length. (This is one thing that separates RISC from CISC <ref>{{cite web |first=David |last=Patterson| title=RISC I: A Reduced Instruction Set VLSI Computer |series=Isca '81|date=12 May 1981|pages=443–457|url=https://dl.acm.org/doi/10.5555/800052.801895}}</ref>). In the original RISC designs, the size of an instruction is 4 bytes, so always add 4 to the instruction address, but don't use PC + 4 for the case of a taken branch, jump, or exception (see '''delayed branches''', below). (Note that some modern machines use more complicated algorithms ([[branch prediction]] and [[branch target predictor|branch target prediction]]) to guess the next instruction address.)
 
===Instruction decode===
Line 29:
The Execute stage is where the actual computation occurs. Typically this stage consists of an ALU, and also a bit shifter. It may also include a multiple cycle multiplier and divider.
 
The ALU is responsible for performing booleanBoolean operations (and, or, not, nand, nor, xor, xnor) and also for performing integer addition and subtraction. Besides the result, the ALU typically provides status bits such as whether or not the result was 0, or if an overflow occurred.
 
The bit shifter is responsible for shift and rotations.
Line 130:
* Predict Not Taken: Always fetch the instruction after the branch from the instruction cache, but only execute it if the branch is not taken. If the branch is not taken, the pipeline stays full. If the branch is taken, the instruction is flushed (marked as if it were a NOP), and one cycle's opportunity to finish an instruction is lost.
* Branch Likely: Always fetch the instruction after the branch from the instruction cache, but only execute it if the branch was taken. The compiler can always fill the branch delay slot on such a branch, and since branches are more often taken than not, such branches have a smaller IPC penalty than the previous kind.
* [[Branch delay slot|Branch Delay Slot]]: AlwaysDepending fetchon the instructiondesign afterof the delayed branch fromand the instructionbranch cacheconditions, andit alwaysis executedetermined it,whether the instruction immediately following the branch instruction is executed even if the branch is taken. Instead of taking an IPC penalty for some fraction of branches either taken (perhaps 60%) or not taken (perhaps 40%), branch delay slots take an IPC penalty for those branches into which the compiler could not schedule the branch delay slot. The SPARC, MIPS, and MC88K designers designed a branch delay slot into their ISAs.
* [[Branch Prediction]]: In parallel with fetching each instruction, guess if the instruction is a branch or jump, and if so, guess the target. On the cycle after a branch or jump, fetch the instruction at the guessed target. When the guess is wrong, flush the incorrectly fetched target.
 
Line 150:
The most common kind of software-visible exception on one of the classic RISC machines is a [[Translation lookaside buffer#TLB-miss handling|''TLB miss'']].
 
Exceptions are different from branches and jumps, because those other control flow changes are resolved in the decode stage. Exceptions are resolved in the writeback stage. When an exception is detected, the following instructions (earlier in the pipeline) are marked as invalid, and as they flow to the end of the pipe their results are discarded. The program counter is set to the address of a special exception handler, and special registers are written with the exception ___location and cause.
 
To make it easy (and fast) for the software to fix the problem and restart the program, the CPU must take a precise exception. A precise exception means that all instructions up to the excepting instruction have been executed, and the excepting instruction and everything afterwards have not been executed.
Line 178:
[[Category:Instruction processing|Pipeline Classic RISC]]
[[Category:Superscalar microprocessors]]
 
[[fa:خط لوله کلاسیک]]