Cycles per instruction: Difference between revisions

Content deleted Content added
Line 20:
# Write-back cycle (WB).
 
Each stage requires one clock cycle and an instruction passes through the stages sequentially. Without [[pipelining]], a new instruction is fetched in stage 1 only after the previous instruction finishes at stage 5, therefore the number of clock cycles it takes to execute an instruction is 5five (CPI = 5 > 1). In this case, the processor is said to be ''subscalar''. With pipelining, a new instruction is fetched every clock cycle by exploiting [[instruction-level parallelism]], therefore, since one could theoretically have 5five instructions in the 5five pipeline stages at once (one instruction per stage), a different instruction would complete stage 5 in every clock cycle and on average the number of clock cycles it takes to execute an instruction is 1 (CPI = 1). In this case, the processor is said to be ''scalar''.
 
With a single-[[Execution unit|execution-unit]] processor, the best CPI attainable is 1. However, with a multiple-execution-unit processor, one may achieve even better CPI values (CPI < 1). In this case, the processor is said to be ''[[superscalar]]''. To get better CPI values without pipelining, the number of execution units must be greater than the number of stages. For example, with 6 executions units, 6 new instructions are fetched in stage 1 only after the 6 previous instructions finish at stage 5, therefore on average the number of clock cycles it takes to execute an instruction is 5/6 (CPI = 5/6 < 1). To get better CPI values with pipelining, there must be at least 2 execution units. For example, with 2 executions units, 2 new instructions are fetched every clock cycle by exploiting instruction-level parallelism, therefore 2 different instructions would complete stage 5 in every clock cycle and on average the number of clock cycles it takes to execute an instruction is 1/2 (CPI = 1/2 < 1).