Revision as of 04:12, 27 January 2017 edit Orkesh Nurbolat (talk \| contribs) 2 edits m example detail Tag: Visual edit ← Previous edit		Revision as of 09:29, 22 February 2017 edit undo Dewritech (talk \| contribs) Extended confirmed users, New page reviewers, Rollbackers 175,559 edits clean up, typo(s) fixed: However → However, using AWB Next edit →
Line 1: {{Refimprove\|date=December 2009}} In [[computer architecture]], '''cycles per instruction''' (aka '''clock cycles per instruction''', '''clocks per instruction''', or '''CPI''') is one aspect of a [[central processing unit\|processor's]] performance: the average number of [[clock cycle]]s per [[Instruction (computer science)\|instruction]] for a program or program fragment.<ref>{{cite book \|title=Computer Organization and Design: The Hardware/Software Interface\|first=David A.\|last1=Patterson\|first2=John L.\|last2=Hennessy\|}}</ref> It is the [[multiplicative inverse]] of [[instructions per cycle]]. ==Definition== Line 22: Each stage requires one clock cycle and an instruction passes through the stages sequentially. Without [[pipelining]], a new instruction is fetched in stage 1 only after the previous instruction finishes at stage 5, therefore the number of clock cycles it takes to execute an instruction is 5 (CPI = 5 > 1). In this case, the processor is said to be ''subscalar''. With pipelining, a new instruction is fetched every clock cycle by exploiting [[instruction-level parallelism]], therefore, since one could theoretically have 5 instructions in the 5 pipeline stages at once (one instruction per stage), a different instruction would complete stage 5 in every clock cycle and on average the number of clock cycles it takes to execute an instruction is 1 (CPI = 1). In this case, the processor is said to be ''scalar''. With a single-[[Execution unit\|execution-unit]] processor, the best CPI attainable is 1. However, with a multiple-execution-unit processor, one may achieve even better CPI values (CPI < 1). In this case, the processor is said to be ''[[superscalar]]''. To get better CPI values without pipelining, the number of execution units must be greater than the number of stages. For example, with 6 executions units, 6 new instructions are fetched in stage 1 only after the 6 previous instructions finish at stage 5, therefore on average the number of clock cycles it takes to execute an instruction is 5/6 (CPI = 5/6 < 1). To get better CPI values with pipelining, there must be at least 2 execution units. For example, with 2 executions units, 2 new instructions are fetched every clock cycle by exploiting instruction-level parallelism, therefore 2 different instructions would complete stage 5 in every clock cycle and on average the number of clock cycles it takes to execute an instruction is 1/2 (CPI = 1/2 < 1). ==Examples==

Cycles per instruction: Difference between revisions