Central processing unit: Difference between revisions

Content deleted Content added
No edit summary
m General cleanup, added Empty section (1) tag
Line 3:
|direction = vertical
|width = 220
 
|image1 = Intel 80486DX2 top.jpg
|caption1 = An [[Intel 80486DX2]] CPU from above
 
|image2 = Intel 80486DX2 bottom.jpg
|caption2 = An Intel 80486DX2 from below}}
 
The '''central processing unit''' ('''CPU''') is the portion of a [[computer]] system that carries out the [[Instruction (computer science)|instruction]]s of a [[computer program]], to perform the basic arithmetical, logical, and input/output operations of the system. The CPU plays a role somewhat analogous to the [[Human brain|brain]] in the computer. The term has been in use in the computer industry at least since the early 1960s.<ref name="weik1961">{{cite paperjournal | author = Weik, Martin H. | title = A Third Survey of Domestic Electronic Digital Computing Systems | publisher = [[Ballistics Research Laboratory|Ballistic Research Laboratories]] | url = http://ed-thelen.org/comp-hist/BRL61.html | dateyear = 1961 }}</ref> The form, design and implementation of CPUs have changed dramatically since the earliest examples, but their fundamental operation remains much the same.
 
On large machines, CPUs require one or more printed circuit boards. On personal computers and small workstations, the CPU is housed in a single chip called a microprocessor. Since the 1970's1970s the microprocessor class of CPUs has almost completely overtaken all other CPU implementations. Modern CPUs are large scale [[integrated circuit]]s in small, rectangular packages, with multiple connecting pins.
 
Two typical components of a CPU are the arithmetic logic unit (ALU), which performs arithmetic and logical operations, and the control unit (CU), which extracts instructions from memory and decodes and executes them, calling on the ALU when necessary.
 
Not all computational systems rely on a central processing unit. An array processor or [[vector processor]] has multiple parallel computing elements, with no one unit considered the "center". In the [[distributed computing]] model, problems are solved by a distributed interconnected set of processors.
 
==History==
Line 24 ⟶ 22:
Computers such as the [[ENIAC]] had to be physically rewired in order to perform different tasks, which caused these machines to be called "fixed-program computers." Since the term "CPU" is generally defined as a [[software]] (computer program) execution device, the earliest devices that could rightly be called CPUs came with the advent of the stored-program computer.
 
The idea of a stored-program computer was already present in the design of [[J. Presper Eckert]] and [[John William Mauchly]]'s ENIAC, but was initially omitted so that it could be finished sooner. On June&nbsp;30, 1945, before ENIAC was made, mathematician [[John von Neumann]] distributed the paper entitled ''[[First Draft of a Report on the EDVAC]]''. It was the outline of a stored-program computer that would eventually be completed in August 1949.<ref>{{cite paperjournal | author = [[]] | title = First Draft of a Report on the EDVAC | publisher = [[Moore School of Electrical Engineering]], [[University of Pennsylvania]] | url = http://www.virtualtravelog.net/entries/2003-08-TheFirstDraft.pdf | dateyear = 1945 }}</ref> EDVAC was designed to perform a certain number of instructions (or operations) of various types. These instructions could be combined to create useful programs for the EDVAC to run. Significantly, the programs written for EDVAC were stored in high-speed [[Memory (computers)|computer memory]] rather than specified by the physical wiring of the computer. This overcame a severe limitation of ENIAC, which was the considerable time and effort required to reconfigure the computer to perform a new task. With von Neumann's design, the program, or software, that EDVAC ran could be changed simply by changing the contents of the memory.
 
Early CPUs were custom-designed as a part of a larger, sometimes one-of-a-kind, computer. However, this method of designing custom CPUs for a particular application has largely given way to the development of mass-produced processors that are made for many purposes. This standardization began in the era of discrete [[transistor]] [[Mainframe computer|mainframes]] and [[minicomputer]]s and has rapidly accelerated with the popularization of the [[integrated circuit]]&nbsp;(IC). The IC has allowed increasingly complex CPUs to be designed and manufactured to tolerances on the order of [[nanometer]]s. Both the miniaturization and standardization of CPUs have increased the presence of digital devices in modern life far beyond the limited application of dedicated computing machines. Modern microprocessors appear in everything from [[automobile]]s to [[cell phone]]s and children's toys.
Line 33 ⟶ 31:
 
== Overview ==
{{Empty section|date=August 2011}}
 
==The control unit==
The control unit of the CPU contains circuitry that uses electrical signals to direct the entire computer system to carry out stored program instructions. The control unit does not execute program instructions; rather, it directs other parts of the system to do so. The control unit must communicate with both the arithmetic/logic unit and memory.
 
===Discrete transistor and integrated circuit CPUs===
Line 44 ⟶ 43:
During this period, a method of manufacturing many transistors in a compact space gained popularity. The [[integrated circuit]] (IC) allowed a large number of transistors to be manufactured on a single [[semiconductor]]-based [[Die (integrated circuit)|die]], or "chip." At first only very basic non-specialized digital circuits such as [[NOR gate]]s were miniaturized into ICs. CPUs based upon these "building block" ICs are generally referred to as "small-scale integration" (SSI) devices. SSI ICs, such as the ones used in the [[Apollo guidance computer]], usually contained transistor counts numbering in multiples of ten. To build an entire CPU out of SSI ICs required thousands of individual chips, but still consumed much less space and power than earlier discrete transistor designs. As microelectronic technology advanced, an increasing number of transistors were placed on ICs, thus decreasing the quantity of individual ICs needed for a complete CPU. MSI and LSI (medium- and large-scale integration) ICs increased transistor counts to hundreds, and then thousands.
 
In 1964 [[IBM]] introduced its [[System/360]] computer architecture which was used in a series of computers that could run the same programs with different speed and performance. This was significant at a time when most electronic computers were incompatible with one another, even those made by the same manufacturer. To facilitate this improvement, IBM utilized the concept of a [[microprogram]] (often called "microcode"), which still sees widespread usage in modern CPUs.<ref name="amdahl1964">{{cite paperjournal | author = [[Gene Amdahl|Amdahl, G. M.]], Blaauw, G. A., & Brooks, F. P. Jr. | title = Architecture of the IBM System/360 | publisher = IBM Research | dateyear = 1964 | url = http://www.research.ibm.com/journal/rd/441/amdahl.pdf }}</ref> The System/360 architecture was so popular that it dominated the [[mainframe computer]] market for decades and left a legacy that is still continued by similar modern computers like the IBM [[zSeries]]. In the same year (1964), [[Digital Equipment Corporation]] (DEC) introduced another influential computer aimed at the scientific and research markets, the [[PDP-8]]. DEC would later introduce the extremely popular [[PDP-11]] line that originally was built with SSI ICs but was eventually implemented with LSI components once these became practical. In stark contrast with its SSI and MSI predecessors, the first LSI implementation of the PDP-11 contained a CPU composed of only four LSI integrated circuits.<ref>{{cite book | author = [[Digital Equipment Corporation]] | year = 1975 | month = November | title = LSI-11, PDP-11/03 user's manual | chapter = LSI-11 Module Descriptions | edition = 2nd | pages = 4–3 | publisher = Digital Equipment Corporation | ___location = Maynard, Massachusetts | url = http://www.classiccmp.org/bitsavers/pdf/dec/pdp11/1103/EK-LSI11-TM-002.pdf }}</ref>
 
Transistor-based computers had several distinct advantages over their predecessors. Aside from facilitating increased reliability and lower power consumption, transistors also allowed CPUs to operate at much higher speeds because of the short switching time of a transistor in comparison to a tube or relay. Thanks to both the increased reliability as well as the dramatically increased speed of the switching elements (which were almost exclusively transistors by this time), CPU clock rates in the tens of megahertz were obtained during this period. Additionally while discrete transistor and IC CPUs were in heavy usage, new high-performance designs like [[SIMD]] (Single Instruction Multiple Data) [[vector processor]]s began to appear. These early experimental designs later gave rise to the era of specialized [[supercomputer]]s like those made by [[Cray Inc.]]
Line 54 ⟶ 53:
|width = 220
|direction = vertical
 
|image1 = 80486dx2-large.jpg
|caption1 = [[Die (integrated circuit)|Die]] of an [[Intel 80486DX2]] microprocessor (actual size: 12×6.75&nbsp;[[millimetre|mm]]) in its packaging
Line 82 ⟶ 80:
==Design and implementation==
{{Main|CPU design}}
 
===Integer range===
The way a CPU represents numbers is a design choice that affects the most basic ways in which the device functions. Some early digital computers used an electrical model of the common [[decimal]] (base ten) [[numeral system]] to represent numbers internally. A few other computers have used more exotic numeral systems like [[Balanced ternary|ternary]] (base three). Nearly all modern CPUs represent numbers in [[Binary numeral system|binary]] form, with each digit being represented by some two-valued physical quantity such as a "high" or "low" [[volt]]age.<ref>The physical concept of [[voltage]] is an analog one by its nature, practically having an infinite range of possible values. For the purpose of physical representation of binary numbers, set ranges of voltages are defined as one or zero. These ranges are usually influenced by the circuit designs and operational parameters of the switching elements used to create the CPU, such as a [[transistor]]'s threshold level.</ref>
Line 104 ⟶ 103:
However, architectural improvements alone do not solve all of the drawbacks of globally synchronous CPUs. For example, a clock signal is subject to the delays of any other electrical signal. Higher clock rates in increasingly complex CPUs make it more difficult to keep the clock signal in phase (synchronized) throughout the entire unit. This has led many modern CPUs to require multiple identical clock signals to be provided in order to avoid delaying a single signal significantly enough to cause the CPU to malfunction. Another major issue as clock rates increase dramatically is the amount of heat that is dissipated by the CPU. The constantly changing clock causes many components to switch regardless of whether they are being used at that time. In general, a component that is switching uses more energy than an element in a static state. Therefore, as clock rate increases, so does heat dissipation, causing the CPU to require more effective cooling solutions.
 
One method of dealing with the switching of unneeded components is called [[clock gating]], which involves turning off the clock signal to unneeded components (effectively disabling them). However, this is often regarded as difficult to implement and therefore does not see common usage outside of very low-power designs. One notable late CPU design that uses clock gating is that of the IBM [[PowerPC]]-based [[Xbox 360]]. It utilizes extensive clock gating in order to reduce the power requirements of the aforementioned videogame console in which it is used. <ref>{{cite web | last = Brown | first = Jeffery | title = Application-customized CPU design | publisher = IBM developerWorks | url = http://www-128.ibm.com/developerworks/power/library/pa-fpfxbox/?ca=dgr-lnxw07XBoxDesign | year = 2005 | accessdate = 2005-12-17 }}</ref> Another method of addressing some of the problems with a global clock signal is the removal of the clock signal altogether. While removing the global clock signal makes the design process considerably more complex in many ways, asynchronous (or clockless) designs carry marked advantages in power consumption and heat dissipation in comparison with similar synchronous designs. While somewhat uncommon, entire [[Asynchronous_circuit#Asynchronous_CPU|asynchronous CPU]]s have been built without utilizing a global clock signal. Two notable examples of this are the [[ARM architecture|ARM]] compliant [[AMULET microprocessor|AMULET]] and the [[MIPS architecture|MIPS]] R3000 compatible MiniMIPS. Rather than totally removing the clock signal, some CPU designs allow certain portions of the device to be asynchronous, such as using asynchronous [[Arithmetic logic unit|ALUs]] in conjunction with superscalar pipelining to achieve some arithmetic performance gains. While it is not altogether clear whether totally asynchronous designs can perform at a comparable or better level than their synchronous counterparts, it is evident that they do at least excel in simpler math operations. This, combined with their excellent power consumption and heat dissipation properties, makes them very suitable for [[embedded computer]]s.<ref>{{cite paperjournal | author = Garside, J. D., Furber, S. B., & Chung, S-H | title = AMULET3 Revealed | publisher = [[University of Manchester]] Computer Science Department | dateyear = 1999 | url = http://www.cs.manchester.ac.uk/apt/publications/papers/async99_A3.php }}</ref>
 
===Parallelism===
Line 126 ⟶ 125:
[[Image:Superscalarpipeline.svg|thumb|Simple superscalar pipeline. By fetching and dispatching two instructions at a time, a maximum of two instructions per cycle can be completed.]]
 
Further improvement upon the idea of instruction pipelining led to the development of a method that decreases the idle time of CPU components even further. Designs that are said to be ''superscalar'' include a long instruction pipeline and multiple identical execution units. <ref>{{cite web | last = Huynh | first = Jack | title = The AMD Athlon XP Processor with 512KB L2 Cache | publisher = University of Illinois&nbsp;— Urbana-Champaign | pages = 6–11 | url = http://courses.ece.uiuc.edu/ece512/Papers/Athlon.pdf | year = 2003 | accessdate = 2007-10-06 }}</ref> In a superscalar pipeline, multiple instructions are read and passed to a dispatcher, which decides whether or not the instructions can be executed in parallel (simultaneously). If so they are dispatched to available execution units, resulting in the ability for several instructions to be executed simultaneously. In general, the more instructions a superscalar CPU is able to dispatch simultaneously to waiting execution units, the more instructions will be completed in a given cycle.
 
Most of the difficulty in the design of a superscalar CPU architecture lies in creating an effective dispatcher. The dispatcher needs to be able to quickly and correctly determine whether instructions can be executed in parallel, as well as dispatch them in such a way as to keep as many execution units busy as possible. This requires that the instruction pipeline is filled as often as possible and gives rise to the need in superscalar architectures for significant amounts of [[CPU cache]]. It also makes [[Hazard (computer architecture)|hazard]]-avoiding techniques like [[branch prediction]], [[speculative execution]], and [[out-of-order execution]] crucial to maintaining high levels of performance. By attempting to predict which branch (or path) a conditional instruction will take, the CPU can minimize the number of times that the entire pipeline must wait until a conditional instruction is completed. Speculative execution often provides modest performance increases by executing portions of code that may not be needed after a conditional operation completes. Out-of-order execution somewhat rearranges the order in which instructions are executed to reduce delays due to data dependencies. Also in case of Single Instructions Multiple Data&nbsp;— a case when a lot of data from the same type has to be processed, modern processors can disable parts of the pipeline so that when a single instruction is executed many times, the CPU skips the fetch and decode phases and thus greatly increases performance on certain occasions, especially in highly monotonous program engines such as video creation software and photo processing.
Line 132 ⟶ 131:
In the case where a portion of the CPU is superscalar and part is not, the part which is not suffers a performance penalty due to scheduling stalls. The Intel [[P5 (microarchitecture)|P5]] [[Pentium (brand)|Pentium]] had two superscalar ALUs which could accept one instruction per clock each, but its FPU could not accept one instruction per clock. Thus the P5 was integer superscalar but not floating point superscalar. Intel's successor to the P5 architecture, [[P6 (microarchitecture)|P6]], added superscalar capabilities to its floating point features, and therefore afforded a significant increase in floating point instruction performance.
 
Both simple pipelining and superscalar design increase a CPU's ILP by allowing a single processor to complete execution of instructions at rates surpassing one instruction per cycle (IPC).<ref>Best-case scenario (or peak) IPC rates in very superscalar architectures are difficult to maintain since it is impossible to keep the instruction pipeline filled all the time. Therefore, in highly superscalar CPUs, average sustained IPC is often discussed rather than peak IPC.</ref> Most modern CPU designs are at least somewhat superscalar, and nearly all general purpose CPUs designed in the last decade are superscalar. In later years some of the emphasis in designing high-ILP computers has been moved out of the CPU's hardware and into its software interface, or [[Instruction_setInstruction set|ISA]]. The strategy of the [[very long instruction word]] (VLIW) causes some ILP to become implied directly by the software, reducing the amount of work the CPU must perform to boost ILP and thereby reducing the design's complexity.
 
====Thread-level parallelism====
Line 163 ⟶ 162:
| url = http://www.cpu-world.com/Glossary/C/CPU_Frequency.html
| accessdate = 1 January 2010 }}</ref>
Many reported IPS values have represented "peak" execution rates on artificial instruction sequences with few branches, whereas realistic workloads consist of a mix of instructions and applications, some of which take longer to execute than others. The performance of the [[memory hierarchy]] also greatly affects processor performance, an issue barely considered in MIPS calculations. Because of these problems, various standardized tests, often called [[benchmark (computing)| "benchmarks"]] for this purpose -- suchpurpose—such as [[SPECint]] -- have been developed to attempt to measure the real effective performance in commonly used applications.
 
Processing performance of computers is increased by using [[multi-core processor]]s, which essentially is plugging two or more individual processors (called ''cores'' in this sense) into one [[integrated circuit]].<ref name="tt">{{Cite web
Line 203 ⟶ 202:
<references />
* <!-- {{note label|HennessyGoldberg1996|Hennessy & Goldberg 1996|a}} --> {{cite book | last = Hennessy | first = John A. | coauthors = Goldberg, David | title = Computer Architecture: A Quantitative Approach | publisher = Morgan Kaufmann Publishers | year = 1996 | isbn = 1-55860-329-8 }}
* {{note label|Knott1974|Knott 1974|a}} Gary D. Knott (1974) ''[http://doi.acm.org/10.1145/775280.775282 A proposal for certain process management and intercommunication primitives]'' ACM SIGOPS Operating Systems Review. Volume 8, Issue 4 (October 1974). pp.&nbsp;7 - 44
* {{note label|MIPSTech2005|MIPS Technologies 2005|a}} {{cite paperjournal | author = MIPS Technologies, Inc. | title = MIPS32 Architecture For Programmers Volume II: The MIPS32 Instruction Set | publisher = [[MIPS Technologies]], Inc. | dateyear = 2005 | url = http://www.mips.com/content/Documentation/MIPSDocumentation/ProcessorArchitecture/doclibrary }}
* {{note label|Smotherman2005|Smotherman 2005|a}} {{cite web | last = Smotherman | first = Mark | year = 2005 | url = http://www.cs.clemson.edu/~mark/multithreading.html | title = History of Multithreading | accessdate = 2005-12-19 }}
</div>
Line 218 ⟶ 217:
*[http://www-03.ibm.com/chips/ IBM Microelectronics] - Microelectronics division of [[IBM]], which is responsible for many [[IBM POWER|POWER]] and [[PowerPC]] based designs, including many of the CPUs utilized in late [[video game console]]s.
*[http://www.intel.com/ Intel Corp] - [[Intel]], a maker of several notable CPU lines, including [[IA-32]] and [[IA-64]]. Also a producer of various peripheral chips for use with their CPUs.
*[http://www.microchip.com/ Microchip Technology Inc.] - [[Microchip Technology | Microchip]], developers of the 8 and 16-bit short pipleine [[RISC]] and [[Digital Signal Processor | DSP]] microcontrollers.
*[http://www.mips.com/ MIPS Technologies] - [[MIPS Technologies]], developers of the [[MIPS architecture]], a pioneer in [[RISC]] designs.
*[http://www.am.necel.com/ NEC Electronics] - [http://www.am.necel.com/ NEC Electronics], developers of the [http://www.am.necel.com/micro/product/all_8_general.html/ 78K0 8-bit Architecture], [http://www.am.necel.com/micro/product/all_16_general.html/ 78K0R 16-bit Architecture], and [http://www.am.necel.com/micro/product/all_32_general.html/ V850 32-bit Architecture].
Line 232 ⟶ 231:
{{CPU technologies}}
{{Basic computer components}}
 
 
{{DEFAULTSORT:Central Processing Unit}}
Line 271 ⟶ 269:
[[jv:Piranti Pamrosésan Sentral]]
[[kn:ಕೇಂದ್ರ ಸಂಸ್ಕರಣ ಘಟಕ]]
[[krc:Процессор]]
[[kk:Процессор]]
[[sw:Bongo kuu (kompyuta)]]
[[krc:Процессор]]
[[la:Processorium medium]]
[[lv:Centrālais procesors]]