Latency oriented processor architecture: Difference between revisions

Content deleted Content added
Created the draft
 
Cleaning up submission (AFCH 0.9)
Line 1:
{{AFC submission|||ts=20160916120704|u=Mohitgupta2105|ns=118|ts=20160916120704}} <!-- Do not remove this line! -->
{{AFC submission|t||ts=20160916120704|u=Mohitgupta2105|ns=118|demo=}}<!-- Important, do not remove this line before article has been created. -->
 
{{dashboard.wikiedu.org sandbox}}
 
'''[[Latency (engineering)|Latency]] oriented processor architecture''' is the [[Microarchitecture|microarchitecture]] of a [[Microprocessor|microprocessor]] designed to serve a serial computing [[Thread (computing)|thread]] with a low latency. This is typical of most [[Central_processing_unit|Central Processing Units (CPU)]] being developed since the 1970s. These architectures, in general, aim to execute as many instructions as possible belonging to a single serial thread, in a given window of time; however, the time to execute a single instruction completely from fetch to retire stages may vary from a few cycles to even a few hundred cycles in some cases. <ref>John Paul Shen, Mikko H. Lipasti (2013). ''Modern Processor Design''. McGraw-Hill Professional. ISBN 1478607831</ref> Latency oriented processor architectures are the opposite of throughput-oriented processors which concern themselves more with the total [[Throughput|throughput]] of the system, rather than the service latencies for all individual threads that they work on. <ref name=YanSohilin2016>Yan Solihin (2016). ''Fundamentals of Parallel Multicore Architecture''. Chapman & Hall/CRC Computational Science. ISBN 978-1482211184</ref> <ref name=GarlandKirk>Understanding Throughput-Oriented Architectures by Michael Garland, David B. Kirk, Communications of the ACM, Vol. 53 No. 11, Pages 58-66</ref>
 
==[[Flynn's taxonomy]]==
Line 10 ⟶ 9:
 
==Implementation techniques==
There are many architectural techniques employed to reduce the overall latency for a single computing task. These typically involve adding additional hardware in the [[Pipeline_(computing)|pipeline]] to serve instructions as soon as they are fetched from [[Random-access_memory|memory]] or [[CPU cache|instruction cache]]. A notable characteristic of these architectures is that a significant area of the chip is used up in parts other than the [[Execution_unit|Execution Units]] themselves. This is because the intent is to bring down the time required to complete a 'typical' task in a computing environment. A typical computing task is a serial set of instructions, where there is a high dependency on results produced by the previous instructions of the same task. Hence, it makes sense that the microprocessor will be spending its time doing many other tasks other than the calculations required by the individual instructions themselves. If the [[Hazard_(computer_architecture)|hazards]] encountered during computation are not resolved quickly, then latency for the thread increases. This is because hazards stall execution of subsequent instructions and, depending upon the pipeline implementation, may either stall progress completely until the dependency is resolved or lead to an avalanche of more hazards in future instructions; further exacerbating execution time for the thread. <ref name="quant">John L. Hennessy, David A. Patterson,''Computer Architecture: A Quantitative Approach'', Fifth Edition (2013), Morgan Kaufmann Publishers, ISBN 012383872X</ref> <ref name="interface">David A. Patterson, John L. Hennessy, ''Computer Organization and Design: The Hardware/software Interface'', Fifth edition (2013), Morgan Kaufmann Publishers, ISBN 9780124078864</ref>
 
The design space of micro-architectural techniques is very large. Below are some of the most commonly employed techniques to reduce the overall latency for a thread.
Line 16 ⟶ 15:
===Instruction set architecture (ISA)===
{{Main article|Instruction set}}
Most architectures today use shorter and simpler instructions, like the [[load/store architecture]], which help in optimizing the instruction pipeline for faster execution. Instructions are usually all of the same size which also helps in optimizing the instruction fetch logic. Such an ISA is called a [[Reduced instruction set computing|RISC]] architecture. <ref> Dileep Bhandarkar, Douglas W. Clark, ''Performance from architecture: comparing a RISC and a CISC with similar hardware organization'', ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems, Pages 310-319</ref>
 
===Instruction Pipelining===