Computer architecture: Difference between revisions

Content deleted Content added
*introduce design innivations
When used in that fashion, "(computer) architecture" is a count noun, so use an article with it.
 
Line 1:
{{Short description|Set of rules describing computer system}}
Computer architecture refers to the theory behind the actual design of a computer. In the same way as a building architect sets the principles and goals of a building project as the basis for the draftsman's plans, so too, a computer architect sets out the Computer Architecture as a basis for the actual design specifications.
{{Lead too short|date=November 2023}}
[[File:Computer architecture block diagram.png|alt=|thumb|upright=1.35|Block diagram of a basic computer with uniprocessor CPU. Black lines indicate the flow of control signals, whereas red lines indicate the flow of processor instructions and data. Arrows indicate the direction of flow.]]
In [[computer science]] and [[computer engineering]], a '''computer architecture''' is the structure of a [[computer]] system made from component parts.<ref>{{cite web|last=Dragoni|first=Nicole|title=Introduction to peer to peer computing|url=http://www2.imm.dtu.dk/courses/02220/2017/L6/P2P.pdf|website=DTU Compute – Department of Applied Mathematics and Computer Science|___location=Lyngby, Denmark|date=n.d.}}</ref> It can sometimes be a high-level description that ignores details of the implementation.<ref>{{cite book|last1=Clements|first1=Alan|title=Principles of Computer Hardware|page=1|edition=Fourth|quote=Architecture describes the internal organization of a computer in an abstract way; that is, it defines the capabilities of the computer and its programming model. You can have two computers that have been constructed in different ways with different technologies but with the same architecture.}}</ref> At a more detailed level, the description may include the [[instruction set architecture]] design, [[microarchitecture]] design, [[logic design]], and [[implementation]].<ref>{{cite book|last1=Hennessy|first1=John|last2=Patterson|first2=David|title=Computer Architecture: A Quantitative Approach|page=11|edition=Fifth|quote=This task has many aspects, including instruction set design, functional organization, logic design, and implementation.}}</ref>
 
== History ==
The first documented computer architecture was in the correspondence between [[Charles Babbage]] and [[Ada Lovelace]], describing the [[analytical engine]]. While building the computer [[Z1 (computer)|Z1]] in 1936, [[Konrad Zuse]] described in two patent applications for his future projects that machine instructions could be stored in the same storage used for data, i.e., the [[Stored-program computer|stored-program]] concept.<ref>{{citation |title=Electronic Digital Computers |journal=Nature |date=25 September 1948 |volume=162 |page=487 |doi=10.1038/162487a0 |last1=Williams |first1=F. C. |last2=Kilburn |first2=T. |issue=4117 |bibcode=1948Natur.162..487W |s2cid=4110351 |doi-access=free }}</ref><ref>Susanne Faber, "Konrad Zuses Bemuehungen um die Patentanmeldung der Z3", 2000</ref> Two other early and important examples are:
* [[John von Neumann]]'s 1945 paper, [[First Draft of a Report on the EDVAC]], which described an organization of logical elements;<ref>{{Cite book|title=First Draft of a Report on the EDVAC|last=Neumann|first=John|year=1945|pages=9}}</ref> and
*[[Alan M. Turing|Alan Turing]]'s more detailed ''Proposed Electronic Calculator'' for the [[Automatic Computing Engine]], also 1945 and which cited [[John von Neumann]]'s paper.<ref>Reproduced in B. J. Copeland (Ed.), "Alan Turing's Automatic Computing Engine", Oxford University Press, 2005, pp. 369–454.</ref>
 
The term "architecture" in computer literature can be traced to the work of Lyle R. Johnson and [[Fred Brooks|Frederick P. Brooks, Jr.]], members of the Machine Organization department in IBM's main research center in 1959. Johnson had the opportunity to write a proprietary research communication about the [[IBM 7030 Stretch|Stretch]], an IBM-developed [[supercomputer]] for [[Los Alamos National Laboratory]] (at the time known as Los Alamos Scientific Laboratory). To describe the level of detail for discussing the luxuriously embellished computer, he noted that his description of formats, instruction types, hardware parameters, and speed enhancements were at the level of "system architecture", a term that seemed more useful than "machine organization".<ref>{{cite web|url=https://archive.computerhistory.org/resources/text/IBM/Stretch/pdfs/05-10/102634114.pdf |last1= Johnson |first1=Lyle| title= A Description of Stretch|page=1|year=1960|access-date=7 October 2017}}</ref>
 
Subsequently, Brooks, a Stretch designer, opened Chapter 2 of a book called ''Planning a Computer System: Project Stretch'' by stating, "Computer architecture, like other architecture, is the art of determining the needs of the user of a structure and then designing to meet those needs as effectively as possible within economic and technological constraints."<ref>{{Cite book |title= Planning a Computer System|last=Buchholz |first=Werner|year=1962|pages=5}}</ref>
There are two customary usages of the term:
 
Brooks went on to help develop the [[IBM System/360]] line of computers, in which "architecture" became a noun defining "what the user needs to know".<ref>{{Cite web|url=http://www-03.ibm.com/ibm/history/ibm100/us/en/icons/system360/|archive-url=https://web.archive.org/web/20120403020049/http://www-03.ibm.com/ibm/history/ibm100/us/en/icons/system360/|url-status=dead|archive-date=April 3, 2012|title=System 360, From Computers to Computer Systems|website=IBM100|date=7 March 2012|access-date=11 May 2017}}</ref> The System/360 line was succeeded by several compatible lines of computers, including the current [[IBM Z]] line. Later, computer users came to use the term in many less explicit ways.<ref>{{cite book|last1=Hellige|first1=Hans Dieter|title=Geschichten der Informatik: Visionen, Paradigmen, Leitmotive|chapter=Die Genese von Wissenschaftskonzeptionen der Computerarchitektur: Vom "system of organs" zum Schichtmodell des Designraums| pages=411–472|year=2004}}</ref>
 
The earliest computer architectures were designed on paper and then directly built into the final hardware form.<ref>ACE underwent seven paper designs in one year, before a prototype was initiated in 1948. [B. J. Copeland (Ed.), "Alan Turing's Automatic Computing Engine", OUP, 2005, p. 57]</ref>
Later, computer architecture prototypes were physically built in the form of a [[transistor–transistor logic]] (TTL) computer—such as the prototypes of the [[Motorola 6800#Development team|6800]] and the [[PA-RISC]]—tested, and tweaked, before committing to the final hardware form.
As of the 1990s, new computer architectures are typically "built", tested, and tweaked—inside some other computer architecture in a [[computer architecture simulator]]; or inside a FPGA as a [[soft microprocessor]]; or both—before committing to the final hardware form.<ref>{{Cite web|url=https://www.cise.ufl.edu/~mssz/CompOrg/CDAintro.html|title=Organization of Computer Systems|last=Schmalz|first=M.S.|website=UF CISE|access-date=11 May 2017}}</ref>
 
== Subcategories ==
The more academic usage refers to the computer's underlying languauge - it's "instruction set". An Architecture that is set out this way will include information such as whether the computer's processor can compute the product of two numbers without resorting to external software. It will also include a nominated precision for the computer's computations.
The discipline of computer architecture has three main subcategories:<ref name=HennessyPattersonQuantitative>{{cite book|author=John L. Hennessy and David A. Patterson|title=Computer Architecture: A Quantitative Approach|edition=Third|publisher=Morgan Kaufmann Publishers}}</ref>
* '''[[Instruction set architecture]]''' (ISA): defines the [[machine code]] that a [[computer processor|processor]] reads and acts upon as well as the [[word size]], [[addressing mode|memory address modes]], [[processor register]]s, and [[data type]].
* '''[[Microarchitecture]]''': also known as "computer organization", this describes how a particular [[Central processing unit|processor]] will implement the ISA.<ref>{{cite book|title= Dictionary of Computer Science, Engineering, and Technology|last=Laplante|first=Phillip A.|year=2001|publisher=CRC Press|isbn=0-8493-2691-5|pages=94–95}}</ref> The size of a computer's [[CPU cache]] for instance, is an issue that generally has nothing to do with the ISA.
* '''[[Systems design]]''': includes all of the other hardware components within a computing system, such as data processing other than the CPU (e.g., [[direct memory access]]), [[virtualization]], and [[multiprocessing]].
 
There are other technologies in computer architecture. The following technologies are used in bigger companies like Intel, and were estimated in 2002<ref name=HennessyPattersonQuantitative /> to count for 1% of all of computer architecture:
* '''Macroarchitecture''': [[architectural layer]]s more abstract than microarchitecture
* '''Assembly instruction set architecture''': A smart assembler may convert an abstract [[assembly language]] common to a group of machines into slightly different [[machine language]] for different [[implementation]]s.
* '''Programmer-visible macroarchitecture''': higher-level language tools such as [[compiler]]s may define a consistent interface or contract to [[programmer]]s using them, abstracting differences between underlying ISAs and [[microarchitecture]]s. For example, the [[C (programming language)|C]], [[C++]], or [[Java (programming language)|Java]] standards define different programmer-visible macroarchitectures.
* '''[[Microcode]]''': microcode is software that translates instructions to run on a chip. It acts like a wrapper around the hardware, presenting a preferred version of the hardware's instruction set interface. This instruction translation facility gives chip designers flexible options: E.g. 1. A new improved version of the chip can use microcode to present the exact same instruction set as the old chip version, so all software targeting that instruction set will run on the new chip without needing changes. E.g. 2. Microcode can present a variety of instruction sets for the same underlying chip, allowing it to run a wider variety of software.
* '''Pin architecture''': The hardware functions that a [[microprocessor]] should provide to a hardware platform, e.g., the [[x86]] pins A20M, FERR/IGNNE or FLUSH. Also, messages that the processor should emit so that external [[CPU cache|caches]] can be invalidated (emptied). Pin architecture functions are more flexible than ISA functions because external hardware can adapt to new encodings, or change from a pin to a message. The term "architecture" fits, because the functions must be provided for compatible systems, even if the detailed method changes.
 
==Roles==
 
===Definition===
The less formal usage refers to a description of the design of gross requirements for the varous parts of a computer, especially speeds, and interconnection requirements.
Computer architecture is concerned with balancing the performance, efficiency, cost, and reliability of a computer system. The case of instruction set architecture can be used to illustrate the balance of these competing factors. More complex [[instruction set]]s enable programmers to write more space efficient programs, since a single instruction can encode some higher-level abstraction (such as the [[X86 instruction listings|x86 Loop instruction]]).<ref>{{cite book |last1=Null |first1=Linda |title=The Essentials of Computer Organization and Architecture |date=2019 |publisher=Jones & Bartlett Learning |___location=Burlington, MA |isbn=9781284123036 |page=280 |edition=5th}}</ref> However, longer and more complex instructions take longer for the [[Processor (computing)|processor]] to decode and can be more costly to implement effectively. The increased complexity from a large instruction set also creates more room for unreliability when instructions interact in unexpected ways.
 
The implementation involves [[integrated circuit design]], packaging, [[Electric power|power]], and [[Computer cooling|cooling]]. Optimization of the design requires familiarity with topics from [[compiler]]s and [[operating system]]s to [[logic design]] and packaging.<ref>{{Cite web|url=https://www.cis.upenn.edu/~milom/cis501-Fall11/lectures/00_intro.pdf|title=What is computer architecture?|last=Martin|first=Milo|website=UPENN|access-date=11 May 2017}}</ref>
 
===Instruction set architecture===
{{Main|Instruction set architecture}}
 
An [[instruction set architecture]] (ISA) is the interface between the computer's software and hardware and also can be viewed as the programmer's view of the machine. Computers do not understand [[high-level programming language]]s such as [[Java (programming language)|Java]], [[C++]], or most programming languages used. A processor only understands instructions encoded in some numerical fashion, usually as [[Binary numeral system|binary number]]s. Software tools, such as [[compiler]]s, translate those high level languages into instructions that the processor can understand.<ref>{{cite web |title=Glossary |url=https://codasip.com/glossary/isa |website=Codasip |access-date=30 May 2025}}</ref><ref>{{cite web |title=What is Instruction Set Architecture (ISA)? |url=https://www.arm.com/glossary/isa |website=The Architecture for the Digital World |access-date=30 May 2025 |language=en}}</ref>
The most common goals of a Computer Architecture include:
 
Besides instructions, the ISA defines items in the computer that are available to a program&mdash;e.g., [[data type]]s, [[Processor register|registers]], [[addressing mode]]s, and [[Computer memory|memory]]. Instructions locate these available items with register indexes (or names) and memory addressing modes.<ref>{{cite web |title=Organization of Computer Systems: ISA, Machine Language, Number Systems |url=https://www.cise.ufl.edu/~mssz/CompOrg/CDA-lang.html |website=www.cise.ufl.edu |access-date=30 May 2025}}</ref><ref>{{cite web |title=Instruction Set Architecture – Computer Architecture |url=https://www.cs.umd.edu/~meesh/411/CA-online/chapter/instruction-set-architecture/index.html |website=www.cs.umd.edu |access-date=30 May 2025}}</ref>
 
The ISA of a computer is usually described in a small instruction manual, which describes how the instructions are encoded. Also, it may define short (vaguely) mnemonic names for the instructions. The names can be recognized by a software development tool called an [[assembler (computer programming)|assembler]]. An assembler is a computer program that translates a human-readable form of the ISA into a computer-readable form. [[Disassembler]]s are also widely available, usually in [[debugger]]s and software programs to isolate and correct malfunctions in binary computer programs.<ref>{{cite book |last1=Hennessy |first1=John L. |last2=Patterson |first2=David A. |title=Computer Architecture: A Quantitative Approach |date=23 November 2017 |publisher=[[Morgan Kaufmann Publishers]] |isbn=978-0-12-811906-8 |url=https://google.com/books/edition/Computer_Architecture/cM8mDwAAQBAJ |access-date=30 May 2025 |language=en}}</ref>
 
ISAs vary in quality and completeness. A good ISA compromises between [[programmer]] convenience (how easy the code is to understand), size of the code (how much code is required to do a specific action), cost of the [[computer]] to interpret the instructions (more complexity means more hardware needed to decode and execute the instructions), and speed of the computer (with more complex decoding hardware comes longer decode time). [[Memory organisation|Memory organization]] defines how instructions interact with the memory, and how memory interacts with itself.
1. Cost
 
During design [[Emulator|emulation]], emulators can run programs written in a proposed instruction set. Modern emulators can measure size, cost, and speed to determine whether a particular ISA is meeting its goals.
Generally, cost is held constant, determined by either system or commercial requirements, and speed and storage capacity are adjusted to meet the cost target.
 
===Computer organization===
{{main | Microarchitecture}}
Computer organization helps optimize performance-based products. For example, software engineers need to know the [[processing power]] of [[Processor (computing)|processors]]. They may need to optimize software in order to gain the most performance for the lowest price. This can require quite a detailed analysis of the computer's organization. For example, in an [[SD card]], the designers might need to arrange the card so that the most data can be processed in the fastest possible way.
 
Computer organization also helps plan the selection of a processor for a particular project. [[Multimedia]] projects may need very rapid data access, while [[virtual machine]]s may need fast interrupts. Sometimes certain tasks need additional components as well. For example, a computer capable of running a virtual machine needs [[virtual memory]] hardware so that the memory of different virtual computers can be kept separated. Computer organization and features also affect power consumption and processor cost.
 
===Implementation===
2. Performance (speed)
Once an [[Instruction set architecture|instruction set]] and [[microarchitecture]] have been designed, a practical machine must be developed. This design process is called the ''implementation''. Implementation is usually not considered architectural design, but rather hardware [[Engineering design process|design engineering]]. Implementation can be further broken down into several steps:
* '''Logic implementation''' designs the circuits required at a [[Logic gate|logic-gate]] level.
* '''Circuit implementation''' does [[transistor]]-level designs of basic elements (e.g., gates, [[multiplexer]]s, [[Flip-flop (electronics)|latches]]) as well as of some larger blocks ([[Arithmetic logic unit|ALU]]s, caches etc.) that may be implemented at the logic-gate level, or even at the physical level if the design calls for it.
* '''Physical implementation''' draws physical circuits. The different circuit components are placed in a chip [[Floorplan (microelectronics)|floor plan]] or on a board and the wires connecting them are created.
* '''Design validation''' tests the computer as a whole to see if it works in all situations and all timings. Once the design validation process starts, the design at the logic level are tested using logic emulators. However, this is usually too slow to run a realistic test. So, after making corrections based on the first test, prototypes are constructed using Field-Programmable Gate-Arrays ([[FPGA]]s). Most hobby projects stop at this stage. The final step is to test prototype integrated circuits, which may require several redesigns.
 
For [[Central processing unit|CPU]]s, the entire implementation process is organized differently and is often referred to as [[CPU design]].
 
==Design goals==
The exact form of a computer system depends on the constraints and goals. Computer architectures usually trade off standards, [[Electric power|power]] versus [[Computer performance|performance]], cost, memory capacity, [[latency (engineering)|latency]] (latency is the amount of time that it takes for information from one node to travel to the source) and throughput. Sometimes other considerations, such as features, size, weight, reliability, and expandability are also factors.
 
The most common scheme does an in-depth power analysis and figures out how to keep power consumption low while maintaining adequate performance.
Computer retailers describe the performance of their machines in terms of [[CPU]] [[Speed]] (in MHz or GHz). This refers to the number of instructions the Central Proccessing Unit (CPU) can perform each second (in millions or billions respectively). However this is only one of a number of factors that impact on the performance of a machine.
 
===Performance===
Modern computer performance is often described in [[instructions per cycle]] (IPC), which measures the efficiency of the architecture at any clock frequency; a faster IPC rate means the computer is faster. Older computers had IPC counts as low as 0.1 while modern processors easily reach nearly 1. [[Superscalar]] processors may reach three to five IPC by executing several instructions per clock cycle.{{citation needed|date=January 2020}}
 
Counting machine-language instructions would be misleading because they can do varying amounts of work in different ISAs. The "instruction" in the standard measurements is not a count of the ISA's machine-language instructions, but a unit of measurement, usually based on the speed of the [[VAX]] computer architecture.
 
Many people used to measure a computer's speed by the [[clock rate]] (usually in [[MHz]] or GHz). This refers to the cycles per second of the main clock of the [[Central processing unit|CPU]]. However, this metric is somewhat misleading, as a machine with a higher clock rate may not necessarily have greater performance. As a result, manufacturers have moved away from clock speed as a measure of performance.
Throughput is the absolute processing power of the computer system. In the most computer systems, throughput is limited to the speed of the slowest piece of hardware that is being utilised at a given time. These may include input and output (I/O), the CPU, the memory chips themselves, or the connection (or "bus") between the memory, the CPU and the I/O. The gating factor most acceptable to users is the speed of the input, because the computer then seems infinitely fast. General-purpose computers like PCs usually maximize throughput to attempt to increase user satisfaction.
 
Other factors influence speed, such as the mix of [[functional unit]]s, [[computer bus|bus]] speeds, available memory, and the type and order of instructions in the programs.
 
There are two main types of speed: [[Latency (engineering)|latency]] and [[throughput]]. Latency is the time between the start of a process and its completion. Throughput is the amount of work done per unit time. [[Interrupt latency]] is the guaranteed maximum response time of the system to an electronic event (like when the disk drive finishes moving some data).
 
Performance is affected by a very wide range of design choices — for example, [[Pipeline (computing)|pipelining]] a processor usually makes latency worse, but makes throughput better. Computers that control machinery usually need low interrupt latencies. These computers operate in a [[real-time computing|real-time]] environment and fail if an operation is not completed in a specified amount of time. For example, computer-controlled anti-lock brakes must begin braking within a predictable and limited time period after the brake pedal is sensed or else failure of the brake will occur.
"[[Interrupt]] [[latency]]" is the guaranteed maximum response time of the software to an event such as the click of a mouse or the reception of data by a modem. This number is affected by a very wide range of design choices. Computers that control machinery usually need low interrupt latencies, because the machine can't, won't or should not wait. For example, computer-controlled anti-lock brakes should not wait for the computer to finish what it's doing- they should brake.
 
[[Benchmark (computing)|Benchmarking]] takes all these factors into account by measuring the time a computer takes to run through a series of test programs. Although benchmarking shows strengths, it should not be how you choose a computer. Often the measured machines split on different measures. For example, one system might handle scientific applications quickly, while another might render [[video game]]s more smoothly. Furthermore, designers may target and add special features to their products, through hardware or software, that permit a specific benchmark to execute quickly but do not offer similar advantages to general tasks.
 
===Power efficiency===
{{Main|Low-power electronics|Performance per watt}}
Power efficiency is another important measurement in modern computers. Higher power efficiency can often be traded for lower speed or higher cost. The typical measurement when referring to power consumption in computer architecture is MIPS/W (millions of instructions per second per watt).
 
Modern circuits have less power required per [[transistor]] as the number of transistors per chip grows.<ref>{{Cite web|url=http://eacharya.inflibnet.ac.in/data-server/eacharya-documents/53e0c6cbe413016f23443704_INFIEP_33/192/ET/33-192-ET-V1-S1__ssed_unit_4_module_10_integrated_circuits_and_fabrication_e-text.pdf|title=Integrated circuits and fabrication|access-date=8 May 2017}}</ref> This is because each transistor that is put in a new chip requires its own power supply and requires new pathways to be built to power it.{{Clarify|reason=The last two sentences seem to contradict each other|date=March 2025}} However, the number of transistors per chip is starting to increase at a slower rate. Therefore, power efficiency is starting to become as important, if not more important than fitting more and more transistors into a single chip. Recent processor designs have shown this emphasis as they put more focus on power efficiency rather than cramming as many transistors into a single chip as possible.<ref>{{Cite web|url=http://www.samsung.com/semiconductor/minisite/Exynos/w/solution/mod_ap/8895/?CID=AFL-hq-mul-0813-11000170|title=Exynos 9 Series (8895)|website=Samsung|access-date=8 May 2017}}</ref> In the world of [[embedded computers]], power efficiency has long been an important goal next to throughput and latency.
Since cost is usually constant, the variables usually consist of latency, throughput, convenience, storage capacity and input-output. The general scheme of optimization is to budget different parts of the computer system. In a balanced computer system, the data rate will be constant for all parts of the system, and cost will be allocated proportionally to assure this. The enact forms of the trade-offs depend on whether the computer system is being optimized to minimize latency or maximize throughput.
 
===Shifts in market demand===
Increases in clock frequency have grown more slowly over the past few years, compared to power reduction improvements. This has been driven by the end of [[Moore's Law]] and demand for longer [[battery life]] and reductions in size for [[mobile technology]]. This change in focus from higher clock rates to power consumption and miniaturization can be shown by the significant reductions in power consumption, as much as 50%, that were reported by [[Intel]] in their release of the [[Haswell (microarchitecture)|Haswell microarchitecture]]; where they dropped their power consumption benchmark from 30–40 [[watt]]s down to 10–20 watts.<ref>{{Cite web|url=http://www.intel.com/content/dam/doc/white-paper/resources-xeon-measuring-processor-power-paper.pdf|title=Measuring Processor Power TDP vs ACP|date=April 2011|website=Intel|access-date=5 May 2017}}</ref> Comparing this to the processing speed increase of 3 GHz to 4 GHz (2002 to 2006), it can be seen that the focus in research and development is shifting away from clock frequency and moving towards consuming less power and taking up less space.<ref>{{Cite web |date=24 April 2012 |title=History of Processor Performance |url=https://www.cs.columbia.edu/~sedwards/classes/2012/3827-spring/advanced-arch-2011.pdf |access-date=5 May 2017 |website=cs.columbia.edu}}</ref>
 
==See also==
{{Portal|Electronics}}
{{cmn|colwidth=30em|
* [[Bit-serial architecture]]
* [[Comparison of CPU architectures]]
* [[Computer hardware]]
* [[CPU design]]
* [[Dataflow architecture]]
* [[Floating point]]
* [[Flynn's taxonomy]]
* [[Harvard architecture]] ([[Modified Harvard architecture|Modified]])
* [[Influence of the IBM PC on the personal computer market]]
* [[Orthogonal instruction set]]
* [[Reconfigurable computing]]
* [[Software architecture]]
* [[Transport triggered architecture]]
* [[Von Neumann architecture]]
}}
 
==References==
<b>[[CPU]] design</b>
{{Reflist}}
 
==Sources==
* {{Cite book |last=[[John L. Hennessy]] and [[David Patterson (scientist)|David Patterson]] |title=Computer Architecture: A Quantitative Approach |publisher=Morgan Kaufmann |edition=Fourth |year=2006 |isbn=978-0-12-370490-0 |url=http://www.elsevierdirect.com/product.jsp?isbn=9780123704900}}
* [[Robert S. Barton|Barton, Robert S.]], "Functional Design of Computers", ''Communications of the ACM'' 4(9): 405 (1961).
* Barton, Robert S., "A New Approach to the Functional Design of a Digital Computer", ''Proceedings of the Western Joint Computer Conference'', May 1961, pp.&nbsp;393–396. About the design of the Burroughs [[Burroughs large systems|B5000]] computer.
* [[Gordon Bell|Bell, C. Gordon]]; and [[Allen Newell|Newell, Allen]] (1971). [http://research.microsoft.com/en-us/um/people/gbell/Computer_Structures__Readings_and_Examples/contents.html "Computer Structures: Readings and Examples"], McGraw-Hill.
* [[Gerrit Blaauw|Blaauw, G.A.]], and [[Fred Brooks|Brooks, F.P., Jr.]], [http://domino.research.ibm.com/tchjr/journalindex.nsf/d9f0a910ab8b637485256bc80066a393/95dc427e3fd3024a85256bfa006859f7?OpenDocument "The Structure of System/360, Part I-Outline of the Logical Structure"], ''IBM Systems Journal'', vol. 3, no. 2, pp.&nbsp;119–135, 1964.
* {{Cite book |last= Tanenbaum |first=Andrew S. |author-link=Andrew S. Tanenbaum |title=Structured Computer Organization |year=1979 |publisher=Prentice-Hall |___location=[[Englewood Cliffs, New Jersey]] |isbn=0-13-148521-0}}
 
==External links==
{{Commons category}}
* [https://www.youtube.com/user/cmu18447 Carnegie Mellon Computer Architecture Lectures]
* [http://portal.acm.org/toc.cfm?id=SERIES416&type=series&coll=GUIDE&dl=GUIDE&CFID=41492512&CFTOKEN=82922478 ISCA: Proceedings of the International Symposium on Computer Architecture]
* [http://www.microarch.org/ Micro: IEEE/ACM International Symposium on Microarchitecture]
* [https://web.archive.org/web/20050528085407/http://www.hpcaconf.org/ HPCA: International Symposium on High Performance Computer Architecture]
* [http://portal.acm.org/toc.cfm?id=SERIES311&type=series&coll=GUIDE&dl=GUIDE&CFID=41492415&CFTOKEN=3676847 ASPLOS: International Conference on Architectural Support for Programming Languages and Operating Systems]
* [http://www.acm.org/taco/ ACM Transactions on Architecture and Code Optimization]
* [https://www.computer.org/csdl/journal/tc IEEE Transactions on Computers]
* {{webarchive|url=https://web.archive.org/web/20171031185802/http://www-scf.usc.edu/~inf520/downloads/The%20von%20Neumann%20Architecture%20of%20Computer%20Systems.pdf|title=The von Neumann Architecture of Computer Systems|date=2017-10-31}}
{{Computer science}}
{{Digital electronics}}
{{Authority control}}
 
{{DEFAULTSORT:Computer Architecture}}
To a large extent, the design of a [[central processing unit]] is the design of its [[control unit]]. The modern (ie, 1965 to 1985) way to design control logic is to write a [[microprogram.]]
[[Category:Computer architecture| ]]
 
[[Category:Central processing unit]]
 
 
CPU design was originally an ad-hoc process. Just getting a CPU to work was a substantial government and technical event.
 
 
 
Key design innovations include [[cache]], [[virtual memory]], [[instruction pipelining]], [[CISC]], [[RISC]], [[virtual machine]], [[emulation]], [[microprogram]] and [[stack]].
 
 
 
The major problem with early computers was that a program for one would not work on others. In 1962, IBM bet the company that microprogrammed computers, all emulating a single reference computer, could provide a family of computers that could all run the same software. Each computer would be targeted at a specific price point. As users' requirements grew, they could move up to larger computers.
 
This computer family was called the [[360/370]], and updated, but compatible computers are still being sold as of 2001.
 
 
 
IBM chose to make the reference instruction set quite complex, and very capable. This was a conscious choice. The "[[control store]]" containing the [[microprogram]] was relatively small, and could be made with very fast memory. Another important effect was that a single instruction could describe quite a complex sequence of operations. Thus the computers would generally have to fetch fewer instructions from the main memory, which could be made slower, smaller and less expensive for a given combination of speed and price.
 
 
 
An often-overlooked feature of the IBM 360 instruction set was that it was the first instruction set designed for data processing, rather than mathematical calculation. The crucial innovation was that memory was designed to addressed in units of a single printable character, a "byte." Also, the instruction set was designed to manipulate not just simple integer numbers, but text, scientific floating-point (similar to the numbers used in a calculator), and the decimal arithmetic needed by accounting systems.
 
 
 
Another important feature was that the IBM [[register]] set was [[binary]], a feature first tested on the [[Whirlwind]] computer built for Lawrence Laboratory's nuclear weapons simulations. Binary arithmetic is substantially cheaper to implement with [[digital logic]], because it requires fewer [[electronic]] devices to store the same number.
 
 
 
Almost all following computers included these innovations in some form. This basic set of features is called a "[[complex instruction set computer]]," or CISC (pronounced "sisk").
 
 
 
In many CISCs, an instruction could access either registers or memory, usually in several different ways. This made the CISCs easier to program, because a programmer could remember just thirty to a hundred instructions, and a set of three to ten "addressing modes," rather than thousands of distinct instructions. This was called an "orthogonal instruction set."
 
 
 
In the early 1980s, researchers at UC Berkley discovered that most computer languages produced only a small subset of the instructions of a CISC. They realized that by making the computer simpler, less orthogonal, they could make it faster and less expensive at the same time.
 
 
 
The computer designs based on this theory were called [[Reduced Instruction Set Computers]], or RISC. RISCs generally had larger numbers of registers, accessed by simpler instructions, with a few instructions specifically to load and store data to memory.
 
 
 
RISCs failed in most markets. Most computers and microprocessors still follow the "complex instruction set computer" (CISC) model. In modern computers CISCs remain in use because they reduce the cost of the memory system, and remain compatible with pre-existing software.
 
 
 
Recently, engineers have found ways to compress the reduced instruction sets so they fit in even smaller memory systems than RISCs. In applications that need no compatibility with older software, compressed RISCs are coming to dominate sales.
 
 
 
Another approach to RISCs was the "niladic" or "zero-address" instruction set. This approach realized that the majority of space in an instruction was to identify the operands of the instruction. These machines placed the operands on a push-down (last-in, first out) stack. The instruction set was supplemented with a few instructions to fetch and store memory. Most used simple caching to provide extremely fast RISC machines, with very compact code. Another benefit was that the interrupt latencies were extremely small, smaller than most CISC machines (a rare trait in RISC machines).
 
 
 
The first zero-address computer was developed by [[Charles Moore]], and placed six 5-bit instructions in a 32-bit word, the first very-long instruction word computer of record.
 
 
 
Commercial variants were mostly characterized as "[[FORTH]]" machines, and probably failed because that language became unpopular. Also, the machines were developed by defense contractors at exactly the time that the cold war ended. Loss of funding may have broken up the development teams before the companies could perform adequate commercial marketing.
 
 
 
In the 1980s, to make computer systems faster, designers began using several "[[execution units]]" operated at overlapping offsets in time. At first, one was used to calculate addresses, and another was used to calculate user data. Then, each of these uses began to subdivide further. In modern CPUs, as many as eight arithmetic-logic units ([[ALU]]) are coordinated to execute a stream of instructions. These CPUs can execute several instructions per clock cycle, where classical CISCs would take up to twelve clock cycles per instruction, or more for some forms of arithmetic. The resulting microcode is complex and error-prone, and the electronics to coordinate these ALUs needs many transistors, increasing power and heat.
 
 
 
In the early 1990s, a significant innovation was to realize that the coordination of a multiple-ALU computer could be moved into the compiler, the software that translates a programmer's instructions in machine-level instructions. In software, the coordination consumed no hardware resources or power, and could take advantage of more knowledge about the computer program. A "[[very long instruction word]]" computer would just have a wide instruction with sub-fields that directly commanded each ALU.
 
 
 
There were several unsuccessful attempts to commercialize [[VLIW]]. The basic problem was that a VLIW computer does not scale to different price and performance points, as a microprogrammed computer can. Also, VLIW computers maximize throughput, not latency, so they were not attractive to the engineers designing controllers and other computers embedded in machinery. The embedded systems markets had often pioneered other computer improvements by providing a large market that did not care about compatibility with older software.
 
 
 
Recently a company called "Transmeta" took the radical step of placing the compiler in the central processing unit, and making the compiler translate from a reference instruction set (in their case, [[80386]]) to a [[VLIW]] instruction set. This approach appears technically and commercially feasible. It may eventually dominate CPU design because it provides the hardware simplicity, low power and speed of VLIW RISC with the compact main memory system and software compatibility provided by CISC.
 
 
 
The majority of computer systems in use today are embedded in other machinery, such as telephones, clocks, appliances, vehicles, and infrastructure. These "embedded systems" usually have small requirements for memory, modest program sizes, and often simple but unusual input/output systems. For example, most embedded systems lack keyboards, screens, disks, printers, or other recognizable I/O devices of a personal computer. They may control electric motors, relays or voltages, and read switches, variable resistors or other electronic devices. Often, the only I/O device readable by a human is a single light-emitting diode, and severe cost or power constraints will eliminate even that.
 
 
 
The most common tradeoff in embedded systems minimizes interrupt latency. A lower latency is often far more useful than another kilobyte of unused memory.
 
 
 
For example, low-latency CPUs generally have relatively few registers in their central processing units. A register is an electronic abacus to store a number during a calculation. When an electronic device causes an interrupt, the intermediate results, the registers, have to be saved before the software for the interrupt can run, and put back after it is done. If there are more registers, this saving and restoring process takes more time, reducing the latency.
 
 
 
Another common problem involves virtual memory. Historically, random-access memory has been thousands of times more expensive than rotating mechanical storage. For businesses, and many general computing tasks, it is a good compromise to never let the computer run out of memory, an event which would halt the program, and greatly inconvenience the user. Instead of halting the program, many computer systems save less-frequently used blocks of memory to the rotating mechanical storage. In essence, the mechanical storage becomes main memory. However, mechanical storage is thousands of times slower than electronic memory. Thus, almost all general-purpose computing systems use "virtual memory" and also have unpredictable interrupt latencies.
 
 
 
The newest development seems to be [[optical_computing]], and this may eventually cause a radical redesign of computer systems.
 
 
 
The most interesting near-term possibility would be to eliminate the bus. Modern vertical laser diodes enable this change. In theory, an optical computer's components could directly connect through a holographic or phased open-air switching system. This would provide a large increase in effective speed and design flexibility, and a large reduction in cost. Since a computer's connectors are also its most likely failure point, a busless system might be more reliable, as well.
 
 
 
Another farther-term possibility is to use light instead of electricity. This would run about 30% faster and use less power, as well as permit a direct interface with quantum computational devices. The chief problem with this approach is that for the forseeable future, electronic devices are faster, smaller (i.e. cheaper) and more reliable. An important theoretical problem is that electronic computational elements are already smaller than some wavelengths of light, and therefore even wave-guide based optical logic may be uneconomic compared to electronic logic. We can therefore expect the majority of development to focus on electronics, no matter how unfair it might seem.