Revision as of 02:53, 22 October 2017 edit Bubba73 (talk \| contribs) Autopatrolled, Extended confirmed users, Pending changes reviewers, Rollbackers 94,426 edits m →Explanation: ditto ← Previous edit		Revision as of 02:55, 22 October 2017 edit undo Bubba73 (talk \| contribs) Autopatrolled, Extended confirmed users, Pending changes reviewers, Rollbackers 94,426 edits m →Explanation Next edit →
Line 20: # Write-back cycle (WB). Each stage requires one clock cycle and an instruction passes through the stages sequentially. Without [[pipelining]], a new instruction is fetched in stage 1 only after the previous instruction finishes at stage 5, therefore the number of clock cycles it takes to execute an instruction is 5five (CPI = 5 > 1). In this case, the processor is said to be ''subscalar''. With pipelining, a new instruction is fetched every clock cycle by exploiting [[instruction-level parallelism]], therefore, since one could theoretically have 5five instructions in the 5five pipeline stages at once (one instruction per stage), a different instruction would complete stage 5 in every clock cycle and on average the number of clock cycles it takes to execute an instruction is 1 (CPI = 1). In this case, the processor is said to be ''scalar''. With a single-[[Execution unit\|execution-unit]] processor, the best CPI attainable is 1. However, with a multiple-execution-unit processor, one may achieve even better CPI values (CPI < 1). In this case, the processor is said to be ''[[superscalar]]''. To get better CPI values without pipelining, the number of execution units must be greater than the number of stages. For example, with 6 executions units, 6 new instructions are fetched in stage 1 only after the 6 previous instructions finish at stage 5, therefore on average the number of clock cycles it takes to execute an instruction is 5/6 (CPI = 5/6 < 1). To get better CPI values with pipelining, there must be at least 2 execution units. For example, with 2 executions units, 2 new instructions are fetched every clock cycle by exploiting instruction-level parallelism, therefore 2 different instructions would complete stage 5 in every clock cycle and on average the number of clock cycles it takes to execute an instruction is 1/2 (CPI = 1/2 < 1).

Cycles per instruction: Difference between revisions