Compiler: Difference between revisions

Content deleted Content added
Undid revision 340087993 by 125.60.250.147 (talk)
mNo edit summary
 
Line 1:
{{Short description |Software that translates code from one programming language to another}}
{{otheruses4|the computing term|the anime|Compiler (anime)}}
{{About |software to translate computer languages |the manga |Compiler (manga)}}
{{Expert-subject|Computer_science|date=December 2008}}
{{Redirect2|Compile|Compiling|the software company |Compile (company)|other uses |Compilation (disambiguation){{!}}Compilation}}
[[Image:Compiler.svg|right|thumb|350px|A diagram of the operation of a typical multi-language, multi-target compiler.]]
{{Use dmy dates |date=October 2020}}
{{Program execution}}
 
In [[computing]], a '''compiler''' is [[software]] that [[Translator (computing)|translates]] computer code written in one [[programming language]] (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primarily used for programs that translate [[source code]] from a [[high-level programming language]] to a [[lower level language |low-level programming language]] (e.g. [[assembly language]], [[object code]], or [[machine code]]) to create an [[executable]] program.<ref>{{cite web |author= |date= |title=Encyclopedia: Definition of Compiler |url=https://www.pcmag.com/encyclopedia/term/compiler |access-date=2 July 2022 |work=PCMag.com}}</ref><ref name="dragon">[[Compilers: Principles, Techniques, and Tools]] by Alfred V. Aho, Ravi Sethi, Jeffrey D. Ullman - Second Edition, 2007</ref>{{rp|p1}}<ref name="SUDARSANAM MALIK FUJITA 2002 pp. 506–515">{{cite book | last1=Sudarsanam | first1=Ashok | last2=Malik | first2=Sharad | last3=Fujita | first3=Masahiro | title=Readings in Hardware/Software Co-Design | chapter=A Retargetable Compilation Methodology for Embedded Digital Signal Processors Using a Machine-Dependent Code Optimization Library | publisher=Elsevier | date=2002 | doi=10.1016/b978-155860702-6/50045-4 | pages=506–515 | isbn=9781558607026 | quote=A compiler is a computer program that translates a program written in a high-level language (HLL), such as C, into an equivalent assembly language program [2]. }}</ref>
A '''compiler''' is a [[computer program]] (or set of programs) that transforms [[source code]] written in a [[programming language|computer language]] (the ''source language'') into another computer language (the ''target language'', often having a binary form known as ''[[object code]]''). The most common reason for wanting to transform source code is to create an [[executable]] program.
 
There are many different types of compilers which produce output in different useful forms. A ''[[cross-compiler]]'' produces code for a different [[Central processing unit|CPU]] or [[operating system]] than the one on which the cross-compiler itself runs. A ''[[bootstrap compiler]]'' is often a temporary compiler, used for compiling a more permanent or better optimized compiler for a language.
The name "compiler" is primarily used for programs that translate source code from a [[high-level programming language]] to a lower level language (e.g., [[assembly language]] or [[machine code]]). A program that translates from a low level language to a higher level one is a ''[[decompiler]]''. A program that translates between high-level languages is usually called a ''[[Translator (computing)|language translator]]'', ''source to source translator'', or ''language converter''. A ''language [[rewriting|rewriter]]'' is usually a program that translates the form of expressions without a change of language.
 
Related software include ''[[decompiler]]s'', programs that translate from low-level languages to higher level ones; programs that translate between high-level languages, usually called ''[[source-to-source compiler]]s'' or ''transpilers''; language ''[[rewriting |rewriter]]s'', usually programs that translate the form of [[Expression (computer science)|expressions]] without a change of language; and ''[[compiler-compiler]]s'', compilers that produce compilers (or parts of them), often in a generic and reusable way so as to be able to produce many differing compilers.
A compiler is likely to perform many or all of the following operations: [[lexical analysis]], [[preprocessing]], [[parsing]], semantic analysis, [[code generation (compiler)|code generation]], and [[code optimization]].
 
A compiler is likely to perform some or all of the following operations, often called phases: [[Preprocessor |preprocessing]], [[lexical analysis]], [[parser |parsing]], [[Semantic analysis (compilers)|semantic analysis]] ([[syntax-directed translation]]), conversion of input programs to an [[intermediate representation]], [[code optimization]] and [[code generation (compiler)|machine specific code generation]]. Compilers generally implement these phases as modular components, promoting efficient design and correctness of [[program transformation|transformation]]s of source input to target output. Program faults caused by incorrect compiler behavior can be very difficult to track down and work around; therefore, compiler implementers invest significant effort to ensure [[compiler correctness]].<ref name="Sun2016">{{cite book |last1=Sun|first1=Chengnian|last2=Le|first2=Vu|last3=Zhang|first3=Qirun|last4=Su|first4=Zhendong|title=Proceedings of the 25th International Symposium on Software Testing and Analysis |chapter=Toward understanding compiler bugs in GCC and LLVM |date=2016|chapter-url=http://dl.acm.org/citation.cfm?doid=2931037.2931074|publisher=ACM|series=ISSTA 2016|pages=294–305|doi=10.1145/2931037.2931074|isbn=9781450343909|s2cid=8339241}}</ref>
Program faults caused by incorrect compiler behavior can be very difficult to track down and work around and compiler implementors invest a lot of time ensuring the [[compiler correctness|correctness of their software]].
 
==Comparison with interpreter==
The term [[compiler-compiler]] is sometimes used to refer to a [[parser generator]], a tool often used to help create the [[lexical analysis|lexer]] and [[parser]].
 
With respect to making source code runnable, an [[interpreter (computing)|interpreter]] provides a similar function as a compiler, but via a different mechanism. An interpreter executes code without converting it to machine code.<ref name="dragon"/>{{rp|p2}} Some interpreters execute source code while others execute an intermediate form such as [[bytecode]].
==History==
{{Main|History of compiler writing}}
Software for early computers was primarily written in assembly language for many years. Higher level programming languages were not invented until the benefits of being able to reuse software on different kinds of [[CPU]]s started to become significantly greater than the cost of writing a compiler. The very limited [[Computer storage|memory]] capacity of early computers also created many technical problems when implementing a compiler.
 
A program compiled to native code tends to run faster than if interpreted. Environments with a bytecode intermediate form tend toward intermediate speed. [[Just-in-time compilation]] allows for native execution speed with a one-time startup processing time cost.
Towards the end of the 1950s, machine-independent programming languages were first proposed. Subsequently, several experimental compilers were developed. The first compiler was written by [[Grace Hopper]], in 1952, for the [[A-0 programming language]]. The [[FORTRAN]]<!-- ###here (only), upper-case FORTRAN is correct, as it was the name used at the time, and on IBM's early compilers ###--> team led by [[John Backus]] at [[IBM]] is generally credited as having introduced the first complete compiler, in 1957. [[COBOL]] was an early language to be compiled on multiple architectures, in 1960.<ref>[http://www.interesting-people.org/archives/interesting-people/199706/msg00011.html IP: "The World's First COBOL Compilers" -- 12 June 1997]</ref>
 
[[Low-level programming language]]s, such as [[assembly language |assembly]] and [[C (programming language)|C]], are typically compiled, especially when speed is a significant concern, rather than [[cross-platform]] support. For such languages, there are more one-to-one correspondences between the source code and the resulting [[machine code]], making it easier for programmers to control the use of hardware.
In many application domains the idea of using a higher level language quickly caught on. Because of the expanding functionality supported by newer [[programming language]]s and the increasing complexity of computer architectures, compilers have become more and more complex.
 
In theory, a programming language can be used via either a compiler or an interpreter, but in practice, each language tends to be used with only one or the other. Nonetheless, it is possible to write a compiler for a language that is commonly interpreted. For example, [[Common Lisp]] can be compiled to Java bytecode (then interpreted by the [[Java virtual machine]]), C code (then compiled to native machine code), or directly to native code.
Early compilers were written in assembly language. The first ''[[self-hosting]]'' compiler &mdash; capable of compiling its own source code in a high-level language &mdash; was created for [[Lisp programming language|Lisp]] by Tim Hart and Mike Levin at [[Massachusetts Institute of Technology|MIT]] in 1962.<ref>[ftp://publications.ai.mit.edu/ai-publications/pdf/AIM-039.pdf T. Hart and M. Levin "The New Compiler", AIM-39] CSAIL Digital Archive - Artificial Intelligence Laboratory Series</ref> Since the 1970s it has become common practice to implement a compiler in the language it compiles, although both [[Pascal (programming language)|Pascal]] and [[C (programming language)|C]] have been popular choices for implementation language. Building a self-hosting compiler is a [[bootstrapping (compilers)|bootstrapping]] problem—the first such compiler for a language must be compiled either by a compiler written in a different language, or (as in Hart and Levin's Lisp compiler) compiled by running the compiler in an [[Interpreter (computing)|interpreter]].
 
== History ==
=== Compilers in education ===
{{Main |History of compiler construction}}
Compiler construction and [[compiler optimization]] are taught at universities and schools as part of the [[computer science]] curriculum. Such courses are usually supplemented with the implementation of a compiler for an [[educational programming language]]. A well-documented example is [[Niklaus Wirth]]'s [[PL/0]] compiler, which Wirth used to teach compiler construction in the 1970s.<ref>[http://www.246.dk/pl0.html "The PL/0 compiler/interpreter"]</ref> In spite of its simplicity, the PL/0 compiler introduced several influential concepts to the field:
[[File:Compiler.svg|upright=1.5|thumb |A diagram of the operation of a typical multi-language, multi-target compiler]]
Theoretical computing concepts developed by scientists, mathematicians, and engineers formed the basis of digital modern computing development during World War II. Primitive binary languages evolved because digital devices only understand ones and zeros and the circuit patterns in the underlying machine architecture. In the late 1940s, assembly languages were created to offer a more workable abstraction of the computer architectures.<ref>{{Cite web |last=Baghai |first=Christian |date=2023-04-04 |title=The Evolution of Programming Languages: From Primitive Binary to High-Level Abstractions |url=https://christianbaghai.medium.com/the-evolution-of-programming-languages-from-primitive-binary-to-high-level-abstractions-7b8e4b7a2521 |access-date=2024-07-10 |website=Medium |language=en}}</ref> Limited [[main memory|memory]] capacity of early computers led to substantial technical challenges when the first compilers were designed. Therefore, the compilation process needed to be divided into several small programs. The front end programs produce the analysis products used by the back end programs to generate target code. As computer technology provided more resources, compiler designs could align better with the compilation process.
 
It is usually more productive for a programmer to use a high-level language, so the development of high-level languages followed naturally from the capabilities offered by digital computers. High-level languages are [[formal language]]s that are strictly defined by their syntax and [[semantics (computer science)|semantics]] which form the high-level language architecture. Elements of these formal languages include:
# Program development by stepwise refinement (also the title of a 1971 paper by Wirth<ref>[http://www.acm.org/classics/dec95/ Book description at the ACM Digital Library]</ref>)
* ''Alphabet'', any finite set of symbols;
# The use of a [[recursive descent parser]]
* ''String'', a finite sequence of symbols;
# The use of [[EBNF]] to specify the syntax of a language
* ''Language'', any set of strings on an alphabet.
# A [[code generation (compiler)|code generator]] producing portable [[P-code]]
# The use of [[T-diagram]]s<ref>T diagrams were first introduced for describing bootstrapping and cross-compiling compilers in McKeeman et al. ''A Compiler Generator'' (1971). Conway described the broader concept before that with his [[UNCOL]] in 1958, to which Bratman added in 1961: H. Bratman, “An alternate form of the ´UNCOL diagram´“, Comm. ACM 4 (March 1961) 3, p. 142. Later on, others, including P.D. Terry, gave an explanation and usage of T-diagrams in their textbooks on the topic of compiler construction. Cf. Terry, 1997, [http://scifac.ru.ac.za/compilers/cha03g.htm Chapter 3]. T-diagrams are also now used to describe client-server interconnectivity on the World Wide Web: cf. Patrick Closhen, et al. 1997: [http://pu.rbg.informatik.tu-darmstadt.de/docs/HJH-19990217-etal-T-diagrams.doc ''T-Diagrams as Visual Language to Illustrate WWW Technology''], Darmstadt University of Technology, Darmstadt, Germany</ref> in the formal description of the [[bootstrapping (compilers)|bootstrapping]] problem
 
The sentences in a language may be defined by a set of rules called a grammar.<ref>Lecture notes. Compilers: Principles, Techniques, and Tools. Jing-Shin Chang. Department of Computer Science & Information Engineering. National Chi-Nan University</ref>
== Compiler output ==
 
[[Backus–Naur form]] (BNF) describes the syntax of "sentences" of a language. It was developed by [[John Backus]] and used for the syntax of [[Algol 60]].<ref>Naur, P. et al. "Report on ALGOL 60". ''Communications of the ACM'' 3 (May 1960), 299–314.</ref> The ideas derive from the [[context-free grammar]] concepts by linguist [[Noam Chomsky]].<ref>{{cite book |title=Syntactic Structures |isbn=978-3-11-017279-9 |first1=Noam |last1=Chomsky |first2=David W. |last2=Lightfoot |publisher=Walter de Gruyter |date=2002}}</ref> "BNF and its [[Extended Backus–Naur form|extensions]] have become standard tools for describing the syntax of programming notations. In many cases, parts of compilers are generated automatically from a BNF description."<ref>{{cite book |title=The Science of Programming |chapter=Appendix 1: Backus-Naur Form |isbn=978-1461259831 |last=Gries |first=David |chapter-url=https://books.google.com/books?id=QFrlBwAAQBAJ&q=1461259835&pg=PA304 |page=304 |publisher=Springer Science & Business Media |date=2012}}</ref>
One classification of compilers is by the [[platform (computing)|platform]] on which their generated code executes. This is known as the ''target platform.''
A ''native'' or ''hosted'' compiler is one whose output is intended to directly run on the same type of computer and operating system that the compiler itself runs on. The output of a [[cross compiler]] is designed to run on a different platform. Cross compilers are often used when developing software for [[embedded system]]s that are not intended to support a software development environment.
 
Between 1942 and 1945, [[Konrad Zuse]] designed the first (algorithmic) programming language for computers called {{lang|de|[[Plankalkül]]}} ("Plan Calculus"). Zuse also envisioned a {{lang|de|Planfertigungsgerät}} ("Plan assembly device") to automatically translate the mathematical formulation of a program into machine-readable [[punched film stock]].<ref name="Hellige_2004"/> While no actual implementation occurred until the 1970s, it presented concepts later seen in [[APL (programming language)|APL]] designed by Ken Iverson in the late 1950s.<ref>{{cite book |title=A Programming Language |url=https://archive.org/details/programminglangu00iver_0 |url-access=registration |first=Kenneth E. |last=Iverson |isbn=978-0-471430-14-8 |publisher=John Wiley & Sons |date=1962}}</ref> APL is a language for mathematical computations.
The output of a compiler that produces code for a [[virtual machine]] (VM) may or may not be executed on the same platform as the compiler that produced it. For this reason such compilers are not usually classified as native or cross compilers.
 
Between 1949 and 1951, [[Heinz Rutishauser]] proposed [[Superplan]], a high-level language and automatic translator.<ref name="Rutishauser_1951"/> His ideas were later refined by [[Friedrich L. Bauer]] and [[Klaus Samelson]].<ref name="Fothe-Wilke_2014"/>
=== Compiled versus interpreted languages ===
 
High-level language design during the formative years of digital computing provided useful programming tools for a variety of applications:
Higher-level programming languages are generally divided for convenience into [[compiled language]]s and [[interpreted language]]s. However, in practice there is rarely anything about a language that ''requires'' it to be exclusively compiled, or exclusively interpreted; although it is possible to design languages that may be inherently interpretive. The categorization usually reflects the most popular or widespread implementations of a language &mdash; for instance, BASIC is sometimes called an interpreted language, and C a compiled one, despite the existence of BASIC compilers and C interpreters.
* [[FORTRAN]] (Formula Translation) for engineering and science applications is considered to be one of the first actually implemented high-level languages and first optimizing compiler.<ref>{{cite book |first=John |last=Backus |chapter=The history of FORTRAN I, II and III |website=Softwarepreservation.org |title=History of Programming Languages |chapter-url=http://www.softwarepreservation.org/projects/FORTRAN/paper/p25-backus.pdf |archive-url=https://ghostarchive.org/archive/20221010/http://www.softwarepreservation.org/projects/FORTRAN/paper/p25-backus.pdf |archive-date=2022-10-10 |url-status=live}}</ref>{{third-party inline|date=October 2024}}
* [[COBOL]] (Common Business-Oriented Language) evolved from [[A-0 System|A-0]] and [[FLOW-MATIC]] to become the dominant high-level language for business applications.<ref>Porter Adams, Vicki (5 October 1981). "Captain Grace M. Hopper: the Mother of COBOL". InfoWorld. 3 (20): 33. ISSN 0199-6649.</ref>
* [[Lisp (programming language)|LISP]] (List Processor) for symbolic computation.<ref>McCarthy, J.; Brayton, R.; Edwards, D.; Fox, P.; Hodes, L.; Luckham, D.; Maling, K.; Park, D.; Russell, S. (March 1960). "LISP I Programmers Manual" (PDF). Boston, Massachusetts: Artificial Intelligence Group, M.I.T. Computation Center and Research Laboratory.</ref>
 
Compiler technology evolved from the need for a strictly defined transformation of the high-level source program into a low-level target program for the digital computer. The compiler could be viewed as a front end to deal with the analysis of the source code and a back end to synthesize the analysis into the target code. Optimization between the front end and back end could produce more efficient target code.<ref>Compilers Principles, Techniques, & Tools 2nd edition by Aho, Lam, Sethi, Ullman {{ISBN |0-321-48681-1}}</ref>
Modern trends toward [[just-in-time compilation]] and [[bytecode|bytecode interpretation]] at times blur the traditional categorizations of compilers and interpreters.
 
Some early milestones in the development of compiler technology:
Some language specifications spell out that implementations ''must'' include a compilation facility; for example, [[Common Lisp]]. However, there is nothing inherent in the definition of Common Lisp that stops it from being interpreted. Other languages have features that are very easy to implement in an interpreter, but make writing a compiler much harder; for example, [[APL (programming language)|APL]], [[SNOBOL4]], and many scripting languages allow programs to construct arbitrary source code at runtime with regular string operations, and then execute that code by passing it to a special evaluation function. To implement these features in a compiled language, programs must usually be shipped with a [[runtime library]] that includes a version of the compiler itself.
* ''May 1952'': [[Grace Hopper]]'s team at [[Remington Rand]] wrote the compiler for the [[A-0 System|A-0]] programming language (and coined the term ''compiler'' to describe it),<ref>{{cite book |last1=Hopper |first1=Grace Murray |title=Proceedings of the 1952 ACM national meeting (Pittsburgh) on - ACM '52 |chapter=The education of a computer |date=1952 |pages=243–249 |doi=10.1145/609784.609818 |s2cid=10081016|doi-access=free }}</ref><ref>{{cite book |last1=Ridgway |first1=Richard K. |title=Proceedings of the 1952 ACM national meeting (Toronto) on - ACM '52 |chapter=Compiling routines |date=1952 |pages=1–5 |doi=10.1145/800259.808980 |s2cid=14878552|doi-access=free }}</ref><ref>{{cite web | title=List of early compilers and assemblers | url=http://shape-of-code.coding-guidelines.com/2017/05/21/evidence-for-28-possible-compilers-in-1957}}</ref> although the A-0 compiler functioned more as a loader or [[Linker (computing)|linker]] than the modern notion of a full compiler.<ref>{{ cite conference |last=Hopper|first=Grace|title=Keynote Address|doi=10.1145/800025.1198341 |book-title=Proceedings of the ACM SIGPLAN History of Programming Languages (HOPL) conference, June 1978 | url=https://dl.acm.org/doi/pdf/10.1145/800025.1198341|url-access=subscription}}</ref><ref>{{ cite web |last=Bruderer|first=Herbert|title=Did Grace Hopper Create the First Compiler? |date=21 December 2022 | url=https://cacm.acm.org/blogs/blog-cacm/268001-did-grace-hopper-create-the-first-compiler/fulltext}}</ref><ref>{{cite journal |last1=Strawn |first1=George |last2=Strawn |first2=Candace |title=Grace Hopper: Compilers and Cobol | url = https://www.computer.org/csdl/magazine/it/2015/01/mit2015010062/13rRUxCitFF |journal=IT Professional |date=2015 |volume=17 |issue=Jan.-Feb. 2015 |pages=62–64 |doi=10.1109/MITP.2015.6 |url-access=subscription }}</ref>
* ''1952, before September'': An [[Autocode]] compiler developed by [[Alick Glennie]] for the [[Manchester Mark I]] computer at the University of Manchester is considered by some to be the first compiled programming language.<ref>Knuth, Donald E.; Pardo, Luis Trabb, "Early development of programming languages", Encyclopedia of Computer Science and Technology (Marcel Dekker) 7: 419–493</ref>
* ''1954–1957'': A team led by [[John Backus]] at [[IBM]] developed [[Fortran|FORTRAN]] which is usually considered the first high-level language. In 1957, they completed a FORTRAN compiler that is generally credited as having introduced the first unambiguously complete compiler.<ref>{{Citation |last=Backus |first=John |title=The history of Fortran I, II, and III |date=1978-06-01 |work=History of programming languages |pages=25–74 |url=https://dl.acm.org/doi/10.1145/800025.1198345 |access-date=2024-10-09 |place=New York, NY, USA |publisher=Association for Computing Machinery |doi=10.1145/800025.1198345 |isbn=978-0-12-745040-7|url-access=subscription }}</ref>
* ''1959'': The Conference on Data Systems Language (CODASYL) initiated development of [[COBOL]]. The COBOL design drew on A-0 and FLOW-MATIC. By the early 1960s COBOL was compiled on multiple architectures.
* ''1958–1960'': [[Algol 58]] was the precursor to [[ALGOL 60]]. It introduced [[Block (programming)|code blocks]], a key advance in the rise of [[structured programming]]. ALGOL 60 was the first language to implement [[nested function]] definitions with [[lexical scope]]. It included [[recursion]]. Its syntax was defined using [[Backus–Naur form|BNF]]. ALGOL 60 inspired many languages that followed it. [[Tony Hoare]] remarked: "... it was not only an improvement on its predecessors but also on nearly all its successors."<ref>{{cite web |first=C.A.R. |last=Hoare |title=Hints on Programming Language Design |date=December 1973 |url=http://www.eecs.umich.edu/~bchandra/courses/papers/Hoare_Hints.pdf |archive-url=https://ghostarchive.org/archive/20221010/http://www.eecs.umich.edu/~bchandra/courses/papers/Hoare_Hints.pdf |archive-date=2022-10-10 |url-status=live |page=27}} (This statement is sometimes erroneously attributed to [[Edsger W. Dijkstra]], also involved in implementing the first ALGOL 60 compiler.)</ref><ref name="r3rs">{{cite web |editor1-first=Jonathan |editor1-last=Rees |editor2-first=William |editor2-last=Clinger |author-first1=Hal |author-last1=Abelson |author-first2=R. K. |author-last2=Dybvig |title=Revised(3) Report on the Algorithmic Language Scheme, (Dedicated to the Memory of ALGOL 60)
| url=http://groups.csail.mit.edu/mac/ftpdir/scheme-reports/r3rs-html/r3rs_toc.html
|access-date=2009-10-20
|display-authors=etal}}</ref>
* ''1958–1962'': [[John McCarthy (computer scientist)|John McCarthy]] at [[MIT]] designed [[Lisp (programming language)|LISP]].<ref>"[https://dspace.mit.edu/bitstream/handle/1721.1/6096/AIM-008.pdf?sequence=2 Recursive Functions of Symbolic Expressions and Their Computation by Machine]", Communications of the ACM, April 1960</ref> The symbol processing capabilities provided useful features for artificial intelligence research. In 1962, LISP 1.5 release noted some tools: an interpreter written by Stephen Russell and Daniel J. Edwards, a compiler and assembler written by Tim Hart and Mike Levin.<ref>{{cite book |title=Lisp 1.5 Programmers Manual |publisher=The MIT Press |last1=McCarthy |first1=John |last2=Abrahams |first2=Paul W. |last3=Edwards |first3=Daniel J. |last4=Hart |first4=Timothy P. |last5=Levin |first5=Michael I. |url=https://books.google.com/books?id=68j6lEJjMQwC&pg=PR1 |isbn=978-0-26213011-0 |date=1965}}</ref>
 
Early operating systems and software were written in assembly language. In the 1960s and early 1970s, the use of high-level languages for system programming was still controversial due to resource limitations. However, several research and industry efforts began the shift toward high-level systems programming languages, for example, [[BCPL]], [[BLISS]], [[B (programming language)|B]], and [[C (programming language)|C]].
=== Hardware compilation ===
 
[[BCPL]] (Basic Combined Programming Language) designed in 1966 by [[Martin Richards (computer scientist)|Martin Richards]] at the University of Cambridge was originally developed as a compiler writing tool.<ref>"[http://prog.vub.ac.be/~tjdhondt/ESL/BCPL_to_Cfront_files/p557-richards.pdf BCPL: A tool for compiler writing and system programming]" M. Richards, University Mathematical Laboratory Cambridge, England 1969</ref> Several compilers have been implemented, Richards' book provides insights to the language and its compiler.<ref>BCPL: The Language and Its Compiler, M Richards, Cambridge University Press (first published 31 December 1981)</ref> BCPL was not only an influential systems programming language that is still used in research<ref>The BCPL Cintsys and Cintpos User Guide, M. Richards, 2017</ref> but also provided a basis for the design of B and C languages.
The output of some compilers may target [[hardware]] at a very low level, for example a [[Field Programmable Gate Array]] (FPGA) or structured [[Application-specific integrated circuit]] (ASIC). Such compilers are said to be ''[[hardware compiler]]s'' or synthesis tools because the programs they compile effectively control the final configuration of the hardware and how it operates; the output of the compilation are not instructions that are executed in sequence - only an interconnection of transistors or lookup tables.
For example, XST is the Xilinx Synthesis Tool used for configuring FPGAs. Similar tools are available from Altera, Synplicity, Synopsys and other vendors.
 
[[BLISS]] (Basic Language for Implementation of System Software) was developed for a Digital Equipment Corporation (DEC) PDP-10 computer by W. A. Wulf's Carnegie Mellon University (CMU) research team. The CMU team went on to develop BLISS-11 compiler one year later in 1970.
== Compiler design ==
 
[[Multics]] (Multiplexed Information and Computing Service), a time-sharing operating system project, involved [[MIT]], [[Bell Labs]], [[General Electric]] (later [[Honeywell]]) and was led by [[Fernando J. Corbató|Fernando Corbató]] from MIT.<ref>{{cite web |first1=F. J. |last1=Corbató |last2=Vyssotsky |first2=V. A. |title=Introduction and Overview of the MULTICS System |work=1965 Fall Joint Computer Conference |publisher=Multicians.org |url=https://multicians.org/fjcc1.html}}</ref> Multics was written in the [[PL/I]] language developed by IBM and IBM User Group.<ref>Report II of the SHARE Advanced Language Development Committee, 25 June 1964</ref> IBM's goal was to satisfy business, scientific, and systems programming requirements. There were other languages that could have been considered but PL/I offered the most complete solution even though it had not been implemented.<ref>Multicians.org "The Choice of PL/I" article, Editor /tom Van Vleck</ref> For the first few years of the Multics project, a subset of the language could be compiled to assembly language with the Early PL/I (EPL) compiler by Doug McIlory and Bob Morris from Bell Labs.<ref>"PL/I As a Tool for System Programming", F.J. Corbato, Datamation 6 May 1969 issue</ref> EPL supported the project until a boot-strapping compiler for the full PL/I could be developed.<ref>"[https://www.computer.org/csdl/proceedings/afips/1969/5074/00/50740187.pdf The Multics PL/1 Compiler]", R. A. Freiburghouse, GE, Fall Joint Computer Conference 1969</ref>
In the early days, the approach taken to compiler design used to be directly affected by the complexity of the processing, the experience of the person(s) designing it, and the resources available.
 
Bell Labs left the Multics project in 1969, and developed a system programming language [[B (programming language)|B]] based on BCPL concepts, written by [[Dennis Ritchie]] and [[Ken Thompson]]. Ritchie created a boot-strapping compiler for B and wrote [[Unix|Unics]] (Uniplexed Information and Computing Service) operating system for a PDP-7 in B. Unics eventually became spelled Unix.
A compiler for a relatively simple language written by one person might be a single, monolithic piece of software. When the source language is large and complex, and high quality output is required the design may be split into a number of relatively independent phases. Having separate phases means development can be parceled up into small parts and given to different people. It also becomes much easier to replace a single phase by an improved one, or to insert new phases later (eg, additional optimizations).
 
Bell Labs started the development and expansion of [[C (programming language)|C]] based on B and BCPL. The BCPL compiler had been transported to Multics by Bell Labs and BCPL was a preferred language at Bell Labs.<ref>Dennis M. Ritchie, "[https://www.bell-labs.com/usr/dmr/www/chist.pdf The Development of the C Language]", ACM Second History of Programming Languages Conference, April 1993</ref> Initially, a front-end program to Bell Labs' B compiler was used while a C compiler was developed. In 1971, a new PDP-11 provided the resource to define extensions to B and rewrite the compiler. By 1973 the design of C language was essentially complete and the Unix kernel for a PDP-11 was rewritten in C. Steve Johnson started development of Portable C Compiler (PCC) to support retargeting of C compilers to new machines.<ref>S.C. Johnson, "a Portable C Compiler: Theory and Practice", 5th ACM POPL Symposium, January 1978</ref><ref>A. Snyder, [https://apps.dtic.mil/sti/pdfs/ADA010218.pdf A Portable Compiler for the Language C], MIT, 1974.</ref>
The division of the compilation processes into phases was championed by the [[Production Quality Compiler-Compiler Project]] (PQCC) at [[Carnegie Mellon]] University. This project introduced the terms ''front end'', ''middle end'', and ''back end''.
 
[[Object-oriented programming]] (OOP) offered some interesting possibilities for application development and maintenance. OOP concepts go further back but were part of [[LISP]] and [[Simula]] language science.<ref>K. Nygaard, University of Oslo, Norway, "[http://www.cs.kent.edu/~durand/CS43101Fall2004/resources/BasicConceptsOOP-Nygaard1986.pdf Basic Concepts in Object Oriented Programming]", SIGPLAN Notices V21, 1986</ref> Bell Labs became interested in OOP with the development of [[C++]].<ref>B. Stroustrup: "What is Object-Oriented Programming?" Proceedings 14th ASU Conference, 1986.</ref> C++ was first used in 1980 for systems programming. The initial design leveraged C language systems programming capabilities with Simula concepts. Object-oriented facilities were added in 1983.<ref>Bjarne Stroustrup, "An Overview of the C++ Programming Language", Handbook of Object Technology (Editor: Saba Zamir, {{ISBN |0-8493-3135-8}})</ref> The Cfront program implemented a C++ front-end for C84 language compiler. In subsequent years several C++ compilers were developed as C++ popularity grew.
All but the smallest of compilers have more than two phases. However, these phases are usually regarded as being part of the front end or the back end. The point at where these two ''ends'' meet is always open to debate. The front end is generally considered to be where syntactic and semantic processing takes place, along with translation to a lower level of representation (than source code).
 
In many application domains, the idea of using a higher-level language quickly caught on. Because of the expanding functionality supported by newer [[programming language]]s and the increasing complexity of computer architectures, compilers became more complex.
The middle end is usually designed to perform optimizations on a form other than the source code or machine code. This source code/machine code independence is intended to enable generic optimizations to be shared between versions of the compiler supporting different languages and target processors.
 
[[DARPA]] (Defense Advanced Research Projects Agency) sponsored a compiler project with Wulf's CMU research team in 1970. The Production Quality Compiler-Compiler [[PQCC]] design would produce a Production Quality Compiler (PQC) from formal definitions of source language and the target.<ref>Leverett, Cattell, Hobbs, Newcomer, Reiner, Schatz, Wulf: "An Overview of the Production Quality Compiler-Compiler Project", CMU-CS-89-105, 1979</ref> PQCC tried to extend the term compiler-compiler beyond the traditional meaning as a parser generator (e.g., [[Yacc]]) without much success. PQCC might more properly be referred to as a compiler generator.
The back end takes the output from the middle. It may perform more analysis, transformations and optimizations that are for a particular computer. Then, it generates code for a particular processor and OS.
 
PQCC research into code generation process sought to build a truly automatic compiler-writing system. The effort discovered and designed the phase structure of the PQC. The BLISS-11 compiler provided the initial structure.<ref>W. Wulf, K. Nori, "[https://apps.dtic.mil/sti/pdfs/ADA125935.pdf Delayed binding in PQCC generated compilers]", CMU Research Showcase Report, CMU-CS-82-138, 1982
This front-end/middle/back-end approach makes it possible to combine front ends for different [[programming language|languages]] with back ends for different [[CPU]]s. Practical examples of this approach are the [[GNU Compiler Collection]], [[LLVM]], and the [[Amsterdam Compiler Kit]], which have multiple front-ends, shared analysis and multiple back-ends.
</ref> The phases included analyses (front end), intermediate translation to virtual machine (middle end), and translation to the target (back end). TCOL was developed for the PQCC research to handle language specific constructs in the intermediate representation.<ref>Joseph M. Newcomer, David Alex Lamb, Bruce W. Leverett, Michael Tighe, William A. Wulf - Carnegie-Mellon University and David Levine, Andrew H. Reinerit - Intermetrics: "TCOL Ada: Revised Report on An Intermediate Representation for the DOD Standard Programming Language", 1979
</ref> Variations of TCOL supported various languages. The PQCC project investigated techniques of automated compiler construction. The design concepts proved useful in optimizing compilers and compilers for the (since 1995, object-oriented) programming language [[Ada (programming language)|Ada]].
 
The Ada ''STONEMAN'' document{{efn| name=Stoneman|1= [[United States Department of Defense]] (18 February 1980) [https://en.wikisource.org/wiki/Stoneman_requirements Stoneman requirements] }} formalized the program support environment (APSE) along with the kernel (KAPSE) and minimal (MAPSE). An Ada interpreter NYU/ED supported development and standardization efforts with the American National Standards Institute (ANSI) and the International Standards Organization (ISO). Initial Ada compiler development by the U.S. Military Services included the compilers in a complete integrated design environment along the lines of the ''STONEMAN'' document. Army and Navy worked on the Ada Language System (ALS) project targeted to DEC/VAX architecture while the Air Force started on the Ada Integrated Environment (AIE) targeted to IBM 370 series. While the projects did not provide the desired results, they did contribute to the overall effort on Ada development.<ref>William A. Whitaker, "Ada - the project: the DoD High Order Working Group", ACM SIGPLAN Notices (Volume 28, No. 3, March 1991)</ref>
===One-pass versus multi-pass compilers===
Classifying compilers by number of passes has its background in the hardware resource limitations of computers. Compiling involves performing lots of work and early computers did not have enough memory to contain one program that did all of this work. So compilers were split up into smaller programs which each made a pass over the source (or some representation of it) performing some of the required analysis and translations.
 
Other Ada compiler efforts got underway in Britain at the University of York and in Germany at the University of Karlsruhe. In the U. S., Verdix (later acquired by Rational) delivered the Verdix Ada Development System (VADS) to the Army. VADS provided a set of development tools including a compiler. Unix/VADS could be hosted on a variety of Unix platforms such as DEC Ultrix and the Sun 3/60 Solaris targeted to Motorola 68020 in an Army CECOM evaluation.<ref>CECOM Center for Software Engineering Advanced Software Technology, "Final Report - Evaluation of the ACEC Benchmark Suite for Real-Time Applications", AD-A231 968, 1990</ref> There were soon many Ada compilers available that passed the Ada Validation tests. The Free Software Foundation GNU project developed the [[GNU Compiler Collection]] (GCC) which provides a core capability to support multiple languages and targets. The Ada version [[GNAT]] is one of the most widely used Ada compilers. GNAT is free but there is also commercial support, for example, AdaCore, was founded in 1994 to provide commercial software solutions for Ada. GNAT Pro includes the GNU GCC based GNAT with a tool suite to provide an [[integrated development environment]].
The ability to compile in a [[one-pass compiler|single pass]] is often seen as a benefit because it simplifies the job of writing a compiler and one pass compilers generally compile faster than [[multi-pass compiler]]s. Many languages were designed so that they could be compiled in a single pass (e.g., [[Pascal (programming language)|Pascal]]).
 
High-level languages continued to drive compiler research and development. Focus areas included optimization and automatic code generation. Trends in programming languages and development environments influenced compiler technology. More compilers became included in language distributions (PERL, Java Development Kit) and as a component of an IDE (VADS, Eclipse, Ada Pro). The interrelationship and interdependence of technologies grew. The advent of web services promoted growth of web languages and scripting languages. Scripts trace back to the early days of Command Line Interfaces (CLI) where the user could enter commands to be executed by the system. User Shell concepts developed with languages to write shell programs. Early Windows designs offered a simple batch programming capability. The conventional transformation of these language used an interpreter. While not widely used, Bash and Batch compilers have been written. More recently sophisticated interpreted languages became part of the developers tool kit. Modern scripting languages include PHP, Python, Ruby and Lua. (Lua is widely used in game development.) All of these have interpreter and compiler support.<ref>P.Biggar, E. de Vries, D. Gregg, "A Practical Solution for Scripting Language Compilers", submission to Science of Computer Programming, 2009</ref>
In some cases the design of a language feature may require a compiler to perform more than one pass over the source. For instance, consider a declaration appearing on line 20 of the source which affects the translation of a statement appearing on line 10. In this case, the first pass needs to gather information about declarations appearing after statements that they affect, with the actual translation happening during a subsequent pass.
 
"When the field of compiling began in the late 50s, its focus was limited to the translation of high-level language programs into machine code ... The compiler field is increasingly intertwined with other disciplines including computer architecture, programming languages, formal methods, software engineering, and computer security."<ref>M.Hall, D. Padua, K. Pingali, "Compiler Research: The Next 50 Years", ACM Communications 2009 Vol 54 #2</ref> The "Compiler Research: The Next 50 Years" article noted the importance of object-oriented languages and Java. Security and [[parallel computing]] were cited among the future research targets.
The disadvantage of compiling in a single pass is that it is not possible to perform many of the sophisticated [[compiler optimization|optimizations]] needed to generate high quality code. It can be difficult to count exactly how many passes an optimizing compiler makes. For instance, different phases of optimization may analyse one expression many times but only analyse another expression once.
 
== Compiler construction ==
Splitting a compiler up into small programs is a technique used by researchers interested in producing provably correct compilers. Proving the correctness of a set of small programs often requires less effort than proving the correctness of a larger, single, equivalent program.
{{more footnotes needed|section|date=December 2019}}
A compiler implements a formal transformation from a high-level source program to a low-level target program. Compiler design can define an end-to-end solution or tackle a defined subset that interfaces with other compilation tools e.g. preprocessors, assemblers, linkers. Design requirements include rigorously defined interfaces both internally between compiler components and externally between supporting toolsets.
 
In the early days, the approach taken to compiler design was directly affected by the complexity of the computer language to be processed, the experience of the person(s) designing it, and the resources available. Resource limitations led to the need to pass through the source code more than once.
While the typical multi-pass compiler outputs machine code from its final pass, there are several other types:
 
A compiler for a relatively simple language written by one person might be a single, monolithic piece of software. However, as the source language grows in complexity the design may be split into a number of interdependent phases. Separate phases provide design improvements that focus development on the functions in the compilation process.
*A "[[source-to-source compiler]]" is a type of compiler that takes a high level language as its input and outputs a high level language. For example, an [[Automatic parallelization|automatic parallelizing]] compiler will frequently take in a high level language program as an input and then transform the code and annotate it with parallel code annotations (e.g. [[OpenMP]]) or language constructs (e.g. Fortran's <code>DOALL</code> statements).
*[[Stage compiler]] that compiles to assembly language of a theoretical machine, like some [[Prolog]] implementations
**This Prolog machine is also known as the [[Warren Abstract Machine]] (or WAM). Bytecode compilers for Java, [[Python language|Python]], and many more are also a subtype of this.
*[[Just-in-time compilation|Just-in-time compiler]], used by Smalltalk and Java systems, and also by Microsoft .Net's [[Common Intermediate Language]] (CIL)
**Applications are delivered in bytecode, which is compiled to native machine code just prior to execution.
 
=== One-pass vis-à-vis multi-pass compilers{{anchor|Single-pass}} ===
=== Front end ===
Classifying compilers by number of passes has its background in the hardware resource limitations of computers. Compiling involves performing much work and early computers did not have enough memory to contain one program that did all of this work. As a result, compilers were split up into smaller programs which each made a pass over the source (or some representation of it) performing some of the required analysis and translations.
 
The ability to compile in a [[one-pass compiler|single pass]] has classically been seen as a benefit because it simplifies the job of writing a compiler and one-pass compilers generally perform compilations faster than [[multi-pass compiler]]s. Thus, partly driven by the resource limitations of early systems, many early languages were specifically designed so that they could be compiled in a single pass (e.g., [[Pascal (programming language)|Pascal]]).
The front end analyzes the source code to build an internal representation of the program, called the [[intermediate representation]] or ''IR''. It also manages the [[symbol table]], a data structure mapping each symbol in the source code to associated information such as ___location, type and scope. This is done over several phases, which includes some of the following:
 
In some cases, the design of a language feature may require a compiler to perform more than one pass over the source. For instance, consider a declaration appearing on line 20 of the source which affects the translation of a statement appearing on line 10. In this case, the first pass needs to gather information about declarations appearing after statements that they affect, with the actual translation happening during a subsequent pass.
# '''Line reconstruction'''. Languages which [[stropping|strop]] their keywords or allow arbitrary spaces within identifiers require a phase before parsing, which converts the input character sequence to a canonical form ready for the parser. The [[top-down parsing|top-down]], [[recursive descent parser|recursive-descent]], table-driven parsers used in the 1960s typically read the source one character at a time and did not require a separate tokenizing phase. [[Atlas Autocode]], and [[Edinburgh IMP|Imp]] (and some implementations of [[Algol60|Algol]] and [[CORAL66|Coral66]]) are examples of stropped languages whose compilers would have a ''Line Reconstruction'' phase.
# [[Lexical analysis]] breaks the source code text into small pieces called ''tokens''. Each token is a single atomic unit of the language, for instance a [[keyword (computing)|keyword]], [[identifier]] or [[symbol|symbol name]]. The token syntax is typically a [[regular language]], so a [[finite state automaton]] constructed from a [[regular expression]] can be used to recognize it. This phase is also called lexing or scanning, and the software doing lexical analysis is called a [[lexical analyzer]] or scanner.
# [[Preprocessor|Preprocessing]]. Some languages, e.g., [[C (programming language)|C]], require a preprocessing phase which supports [[Macro (computer science)|macro]] substitution and conditional compilation. Typically the preprocessing phase occurs before syntactic or semantic analysis; e.g. in the case of C, the preprocessor manipulates lexical tokens rather than syntactic forms. However, some languages such as [[Scheme (programming language)|Scheme]] support macro substitutions based on syntactic forms.
# [[Syntax analysis]] involves [[parsing]] the token sequence to identify the syntactic structure of the program. This phase typically builds a [[parse tree]], which replaces the linear sequence of tokens with a tree structure built according to the rules of a [[formal grammar]] which define the language's syntax. The parse tree is often analyzed, augmented, and transformed by later phases in the compiler.
# Semantic analysis is the phase in which the compiler adds semantic information to the [[parse tree]] and builds the symbol table. This phase performs semantic checks such as [[type checking]] (checking for type errors), or [[object binding]] (associating variable and function references with their definitions), or [[definite assignment analysis|definite assignment]] (requiring all local variables to be initialized before use), rejecting incorrect programs or issuing warnings. Semantic analysis usually requires a complete parse tree, meaning that this phase logically follows the [[parsing]] phase, and logically precedes the [[code generation (compiler)|code generation]] phase, though it is often possible to fold multiple phases into one pass over the code in a compiler implementation.
 
The disadvantage of compiling in a single pass is that it is not possible to perform many of the sophisticated [[compiler optimization|optimizations]] needed to generate high quality code. It can be difficult to count exactly how many passes an optimizing compiler makes. For instance, different phases of optimization may analyse one expression many times but only analyse another expression once.
===Back end===
 
Splitting a compiler up into small programs is a technique used by researchers interested in producing provably correct compilers. Proving the correctness of a set of small programs often requires less effort than proving the correctness of a larger, single, equivalent program.
The term ''back end'' is sometimes confused with ''[[code generation (compiler)|code generator]]'' because of the overlapped functionality of generating assembly code. Some literature uses ''middle end'' to distinguish the generic analysis and optimization phases in the back end from the machine-dependent code generators.
 
=== Three-stage compiler structure ===
The main phases of the back end include the following:
[[File:Compiler design.svg|thumb|center|upright=2.5|Compiler design]]
Regardless of the exact number of phases in the compiler design, the phases can be assigned to one of three stages. The stages include a front end, a middle end, and a back end.
* The ''front end'' scans the input and verifies syntax and semantics according to a specific source language. For [[Type system|statically typed languages]] it performs [[type checking]] by collecting type information. If the input program is syntactically incorrect or has a type error, it generates error and/or warning messages, usually identifying the ___location in the source code where the problem was detected; in some cases the actual error may be (much) earlier in the program. Aspects of the front end include lexical analysis, syntax analysis, and semantic analysis. The front end transforms the input program into an [[intermediate representation]] (IR) for further processing by the middle end. This IR is usually a lower-level representation of the program with respect to the source code.
* The ''middle end'' performs optimizations on the IR that are independent of the CPU architecture being targeted. This source code/machine code independence is intended to enable generic optimizations to be shared between versions of the compiler supporting different languages and target processors. Examples of middle end optimizations are removal of useless ([[dead-code elimination]]) or unreachable code ([[reachability analysis]]), discovery and propagation of constant values ([[constant propagation]]), relocation of computation to a less frequently executed place (e.g., out of a loop), or specialization of computation based on the context, eventually producing the "optimized" IR that is used by the back end.
* The ''back end'' takes the optimized IR from the middle end. It may perform more analysis, transformations and optimizations that are specific for the target CPU architecture. The back end generates the target-dependent assembly code, performing [[register allocation]] in the process. The back end performs [[instruction scheduling]], which re-orders instructions to keep parallel [[execution unit]]s busy by filling [[delay slot]]s. Although most optimization problems are [[NP-hardness|NP-hard]], [[Heuristic (computer science)|heuristic]] techniques for solving them are well-developed and implemented in production-quality compilers. Typically the output of a back end is machine code specialized for a particular processor and operating system.
 
This front/middle/back-end approach makes it possible to combine front ends for different languages with back ends for different [[Central processing unit|CPUs]] while sharing the optimizations of the middle end.<ref>Cooper and Torczon 2012, p. 8</ref> Practical examples of this approach are the [[GNU Compiler Collection]], [[Clang]] ([[LLVM]]-based C/C++ compiler),<ref name="LattnerBook1st">{{cite book | author = Lattner, Chris |editor = Brown, Amy |editor2=Wilson, Greg | date = 2017 | chapter = LLVM | title = The Architecture of Open Source Applications | chapter-url = http://www.aosabook.org/en/llvm.html | access-date = 28 February 2017 | url-status = live | archive-url = https://web.archive.org/web/20161202070941/http://aosabook.org/en/llvm.html | archive-date = 2 December 2016}}</ref> and the [[Amsterdam Compiler Kit]], which have multiple front-ends, shared optimizations and multiple back-ends.
# [[Compiler analysis|Analysis]]: This is the gathering of program information from the intermediate representation derived from the input. Typical analyses are [[data flow analysis]] to build [[use-define chain]]s, [[dependence analysis]], [[alias analysis]], [[pointer analysis]], [[escape analysis]] etc. Accurate analysis is the basis for any compiler optimization. The [[call graph]] and [[control flow graph]] are usually also built during the analysis phase.
# [[Compiler optimization|Optimization]]: the intermediate language representation is transformed into functionally equivalent but faster (or smaller) forms. Popular optimizations are [[inline expansion]], [[dead code elimination]], [[constant propagation]], [[loop transformation]], [[register allocation]] or even [[automatic parallelization]].
# [[Code generation (compiler)|Code generation]]: the transformed intermediate language is translated into the output language, usually the native [[machine language]] of the system. This involves resource and storage decisions, such as deciding which variables to fit into registers and memory and the selection and scheduling of appropriate machine instructions along with their associated addressing modes (see also [[Sethi-Ullman algorithm]]).
 
==== Front end ====
Compiler analysis is the prerequisite for any compiler optimization, and they tightly work together. For example, [[dependence analysis]] is crucial for [[loop transformation]].
[[File:Xxx Scanner and parser example for C.gif|thumb|right|400px|[[Lexical analysis|Lexer]] and [[Parsing|parser]] example for [[C (programming language)|C]]. Starting from the sequence of characters "<code>if(net>0.0)total+=net*(1.0+tax/100.0);</code>", the scanner composes a sequence of [[Lexical analysis#token|tokens]], and categorizes each of them, for example as {{color|#600000|identifier}}, {{color|#606000|reserved word}}, {{color|#006000|number literal}}, or {{color|#000060|operator}}. The latter sequence is transformed by the parser into a [[abstract syntax tree|syntax tree]], which is then treated by the remaining compiler phases. The scanner and parser handles the [[regular grammar|regular]] and properly [[context-free grammar|context-free]] parts of the [[C syntax|grammar for C]], respectively.]]
 
The front end analyzes the source code to build an internal representation of the program, called the [[intermediate representation]] (IR). It also manages the [[symbol table]], a data structure mapping each symbol in the source code to associated information such as ___location, type and scope.
In addition, the scope of compiler analysis and optimizations vary greatly, from as small as a [[basic block]] to the procedure/function level, or even over the whole program ([[interprocedural optimization]]). Obviously, a compiler can potentially do a better job using a broader view. But that broad view is not free: large scope analysis and optimizations are very costly in terms of compilation time and memory space; this is especially true for interprocedural analysis and optimizations.
 
While the frontend can be a single monolithic function or program, as in a [[scannerless parser]], it was traditionally implemented and analyzed as several phases, which may execute sequentially or concurrently. This method is favored due to its modularity and [[separation of concerns]]. Most commonly, the frontend is broken into three phases: [[lexical analysis]] (also known as lexing or scanning), [[syntax analysis]] (also known as scanning or parsing), and [[Semantic analysis (compilers)|semantic analysis]]. Lexing and parsing comprise the syntactic analysis (word syntax and phrase syntax, respectively), and in simple cases, these modules (the lexer and parser) can be automatically generated from a grammar for the language, though in more complex cases these require manual modification. The lexical grammar and phrase grammar are usually [[context-free grammar]]s, which simplifies analysis significantly, with context-sensitivity handled at the semantic analysis phase. The semantic analysis phase is generally more complex and written by hand, but can be partially or fully automated using [[attribute grammar]]s. These phases themselves can be further broken down: lexing as scanning and evaluating, and parsing as building a [[Parse tree|concrete syntax tree]] (CST, parse tree) and then transforming it into an [[abstract syntax tree]] (AST, syntax tree). In some cases additional phases are used, notably ''line reconstruction'' and ''preprocessing,'' but these are rare.
Interprocedural analysis and optimizations are common in modern commercial compilers from [[Hewlett-Packard|HP]], [[IBM]], [[Silicon Graphics|SGI]], [[Intel]], [[Microsoft]], and [[Sun Microsystems]]. The open source [[GNU Compiler Collection|GCC]] was criticized for a long time for lacking powerful interprocedural optimizations, but it is changing in this respect. Another open source compiler with full analysis and optimization infrastructure is [[Open64]], which is used by many organizations for research and commercial purposes.
 
The main phases of the front end include the following:
* ''{{visible anchor|Line reconstruction}}'' converts the input character sequence to a canonical form ready for the parser. Languages which [[stropping (syntax)|strop]] their keywords or allow arbitrary spaces within identifiers require this phase. The [[top-down parsing|top-down]], [[recursive descent parser|recursive-descent]], table-driven parsers used in the 1960s typically read the source one character at a time and did not require a separate tokenizing phase. [[Atlas Autocode]] and [[Edinburgh IMP|Imp]] (and some implementations of [[ALGOL]] and [[Coral 66]]) are examples of stropped languages whose compilers would have a ''Line Reconstruction'' phase.
* ''[[Preprocessor|Preprocessing]]'' supports [[Macro (computer science)|macro]] substitution and [[conditional compilation]]. Typically the preprocessing phase occurs before syntactic or semantic analysis; e.g. in the case of C, the preprocessor manipulates lexical tokens rather than syntactic forms. However, some languages such as [[Scheme (programming language)|Scheme]] support macro substitutions based on syntactic forms.
* ''[[Lexical analysis]]'' (also known as ''lexing'' or ''tokenization'') breaks the source code text into a sequence of small pieces called ''lexical tokens''.<ref>Aho, Lam, Sethi, Ullman 2007, p. 5-6, 109-189</ref> This phase can be divided into two stages: the ''scanning'', which segments the input text into syntactic units called ''lexemes'' and assigns them a category; and the ''evaluating'', which converts lexemes into a processed value. A token is a pair consisting of a ''token name'' and an optional ''token value''.<ref>Aho, Lam, Sethi, Ullman 2007, p. 111</ref> Common token categories may include identifiers, keywords, separators, operators, literals and comments, although the set of token categories varies in different [[programming language]]s. The lexeme syntax is typically a [[regular language]], so a [[finite-state automaton]] constructed from a [[regular expression]] can be used to recognize it. The software doing lexical analysis is called a [[lexical analyzer]]. This may not be a separate step—it can be combined with the parsing step in [[scannerless parsing]], in which case parsing is done at the character level, not the token level.
* ''[[Syntax analysis]]'' (also known as ''parsing'') involves [[parsing]] the token sequence to identify the syntactic structure of the program. This phase typically builds a [[parse tree]], which replaces the linear sequence of tokens with a tree structure built according to the rules of a [[formal grammar]] which define the language's syntax. The parse tree is often analyzed, augmented, and transformed by later phases in the compiler.<ref>Aho, Lam, Sethi, Ullman 2007, p. 8, 191-300</ref>
* ''[[Semantic analysis (compilers)|Semantic analysis]]'' adds semantic information to the [[parse tree]] and builds the [[symbol table]]. This phase performs semantic checks such as [[type checking]] (checking for type errors), or [[object binding]] (associating variable and function references with their definitions), or [[definite assignment analysis|definite assignment]] (requiring all local variables to be initialized before use), rejecting incorrect programs or issuing warnings. Semantic analysis usually requires a complete parse tree, meaning that this phase logically follows the [[parsing]] phase, and logically precedes the [[code generation (compiler)|code generation]] phase, though it is often possible to fold multiple phases into one pass over the code in a compiler implementation.
 
==== Middle end ====
The middle end, also known as ''optimizer,'' performs optimizations on the intermediate representation in order to improve the performance and the quality of the produced machine code.<ref name="Hjort Blindell, Gabriel">{{Cite book |title=Instruction selection: Principles, methods, and applications |publisher=Springer |last= Blindell |first=Gabriel Hjort |isbn=978-3-31934019-7 |___location=Switzerland |oclc=951745657 |date=2016-06-03}}</ref> The middle end contains those optimizations that are independent of the CPU architecture being targeted.
 
The main phases of the middle end include the following:
* [[Compiler analysis|Analysis]]: This is the gathering of program information from the intermediate representation derived from the input; [[data-flow analysis]] is used to build [[use-define chain]]s, together with [[dependence analysis]], [[alias analysis]], [[pointer analysis]], [[escape analysis]], etc. Accurate analysis is the basis for any compiler optimization. The [[control-flow graph]] of every compiled function and the [[call graph]] of the program are usually also built during the analysis phase.
* [[Compiler optimization|Optimization]]: the intermediate language representation is transformed into functionally equivalent but faster (or smaller) forms. Popular optimizations are [[inline expansion]], [[dead-code elimination]], [[constant propagation]], [[loop transformation]] and even [[automatic parallelization]].
Compiler analysis is the prerequisite for any compiler optimization, and they tightly work together. For example, [[dependence analysis]] is crucial for [[loop transformation]].
 
The scope of compiler analysis and optimizations vary greatly; their scope may range from operating within a [[basic block]], to whole procedures, or even the whole program. There is a trade-off between the granularity of the optimizations and the cost of compilation. For example, [[peephole optimization]]s are fast to perform during compilation but only affect a small local fragment of the code, and can be performed independently of the context in which the code fragment appears. In contrast, [[interprocedural optimization]] requires more compilation time and memory space, but enable optimizations that are only possible by considering the behavior of multiple functions simultaneously.
 
Interprocedural analysis and optimizations are common in modern commercial compilers from [[Hewlett-Packard|HP]], [[IBM]], [[Silicon Graphics|SGI]], [[Intel]], [[Microsoft]], and [[Sun Microsystems]]. The [[free software]] [[GNU Compiler Collection|GCC]] was criticized for a long time for lacking powerful interprocedural optimizations, but it is changing in this respect. Another open source compiler with full analysis and optimization infrastructure is [[Open64]], which is used by many organizations for research and commercial purposes.
 
Due to the extra time and space needed for compiler analysis and optimizations, some compilers skip them by default. Users have to use compilation options to explicitly tell the compiler which optimizations should be enabled.
 
==== RelatedBack techniquesend ====
The back end is responsible for the CPU architecture specific optimizations and for [[code generation (compiler)|code generation]].<ref name="Hjort Blindell, Gabriel"/>
 
The main phases of the back end include the following:
[[Assembly language]] is not a high-level language and a program that compiles it is more commonly known as an ''assembler'', with the inverse program known as a ''[[disassembler]]''.
* ''Machine dependent optimizations'': optimizations that depend on the details of the CPU architecture that the compiler targets.<ref>Cooper and Toczon (2012), p. 540</ref> A prominent example is [[peephole optimization]]s, which rewrites short sequences of assembler instructions into more efficient instructions.
* ''[[Code generation (compiler)|Code generation]]'': the transformed intermediate language is translated into the output language, usually the native [[machine language]] of the system. This involves resource and storage decisions, such as deciding which variables to fit into [[Register allocation|registers]] and memory and the [[Instruction selection|selection]] and [[Instruction scheduling|scheduling]] of appropriate machine instructions along with their associated [[addressing mode]]s (see also [[Sethi–Ullman algorithm]]). Debug data may also need to be generated to facilitate [[debugging]].
 
=== Compiler correctness ===
A program that translates from a low level language to a higher level one is a ''[[decompiler]]''.
{{Main|Compiler correctness}}
[[Compiler correctness]] is the branch of software engineering that deals with trying to show that a compiler behaves according to its [[programming language|language specification]].<ref>{{Citation |title=S1-A Simple Compiler |date=2012-02-28 |url=http://dx.doi.org/10.1002/9781118112762.ch12 |work=Compiler Construction Using Java, JavaCC, and Yacc |pages=289–329 |access-date=2023-05-17 |place=Hoboken, NJ, US |publisher=John Wiley & Sons, Inc. |doi=10.1002/9781118112762.ch12 |isbn=978-1-118-11276-2|url-access=subscription }}</ref> Techniques include developing the compiler using [[formal methods]] and using rigorous testing (often called compiler validation) on an existing compiler.
 
== Compiled vis-à-vis interpreted languages ==
A program that translates between high-level languages is usually called a ''language translator'', ''source to source translator'', ''language converter'', or ''language [[rewriting|rewriter]]''. The last term is usually applied to translations that do not involve a change of language.
 
Higher-level programming languages usually appear with a type of [[Translator (computing)|translation]] in mind: either designed as [[compiled language]] or [[interpreted language]]. However, in practice there is rarely anything about a language that ''requires'' it to be exclusively compiled or exclusively interpreted, although it is possible to design languages that rely on re-interpretation at run time. The categorization usually reflects the most popular or widespread implementations of a language – for instance, [[BASIC]] is sometimes called an interpreted language, and C a compiled one, despite the existence of BASIC compilers and C interpreters.<ref>{{Cite web |title=Compiler vs. Interpreter in Programming |url=https://builtin.com/software-engineering-perspectives/compiler-vs-interpreter |access-date=2025-05-25 |website=Built In |language=en}}</ref>
== International conferences and organizations ==
 
<!-- [[International Conference on Compiler Construction]] and [[European Joint Conferences on Theory and Practice of Software]] redirect here -->
Interpretation does not replace compilation completely. It only hides it from the user and makes it gradual. Even though an interpreter can itself be interpreted, a set of directly executed machine instructions is needed somewhere at the bottom of the execution stack (see [[machine language]]).
Every year, the '''European Joint Conferences on Theory and Practice of Software''' (ETAPS) sponsors the '''International Conference on Compiler Construction''' (CC), with papers from both the academic and industrial sectors.<ref>[http://www.etaps.org/ ETAPS] - European Joint Conferences on Theory and Practice of Software. Cf. "CC" (Compiler Construction) subsection.</ref>
 
Furthermore, for optimization compilers can contain interpreter functionality, and interpreters may include ahead of time compilation techniques. For example, where an expression can be executed during compilation and the results inserted into the output program, then it prevents it having to be recalculated each time the program runs, which can greatly speed up the final program. Modern trends toward [[just-in-time compilation]] and [[bytecode|bytecode interpretation]] at times blur the traditional categorizations of compilers and interpreters even further.
 
Some language specifications spell out that implementations ''must'' include a compilation facility; for example, [[Common Lisp]]. However, there is nothing inherent in the definition of Common Lisp that stops it from being interpreted. Other languages have features that are very easy to implement in an interpreter, but make writing a compiler much harder; for example, [[APL (programming language)|APL]], [[SNOBOL4]],<ref>{{Cite web |date=2023-10-27 |title=SNOBOL4 Programming Language |url=https://sourceforge.net/projects/snobol4/ |access-date=2025-05-25 |website=SourceForge |language=en}}</ref> and many scripting languages allow programs to construct arbitrary source code at runtime with regular string operations, and then execute that code by passing it to a special [[eval|evaluation function]]. To implement these features in a compiled language, programs must usually be shipped with a [[runtime library]] that includes a version of the compiler itself.
 
== Types ==
One classification of compilers is by the [[Computing platform|platform]] on which their generated code executes. This is known as the ''target platform.''
 
A ''native'' or ''hosted'' compiler is one whose output is intended to directly run on the same type of computer and operating system that the compiler itself runs on. The output of a [[cross compiler]] is designed to run on a different platform. Cross compilers are often used when developing software for [[embedded system]]s that are not intended to support a software development environment.
 
The output of a compiler that produces code for a [[virtual machine]] (VM) may or may not be executed on the same platform as the compiler that produced it. For this reason, such compilers are not usually classified as native or cross compilers.
 
The lower level language that is the target of a compiler may itself be a [[high-level programming language]]. C, viewed by some as a sort of portable assembly language, is frequently the target language of such compilers. For example, [[Cfront]], the original compiler for [[C++]], used C as its target language. The C code generated by such a compiler is usually not intended to be readable and maintained by humans, so [[indent style]] and creating pretty C intermediate code are ignored. Some of the features of C that make it a good target language include the [[C preprocessor#Special macros and directives|<code>#line</code>]] directive, which can be generated by the compiler to support [[debugging]] of the original source, and the wide platform support available with C compilers.
 
While a common compiler type outputs machine code, there are many other types:
* [[Source-to-source compiler]]s are a type of compiler that takes a high-level language as its input and outputs a high-level language. For example, an [[Automatic parallelization|automatic parallelizing]] compiler will frequently take in a high-level language program as an input and then transform the code and annotate it with parallel code annotations (e.g. [[OpenMP]]) or language constructs (e.g. Fortran's <code>DOALL</code> statements). Other terms for a source-to-source compiler are transcompiler or transpiler.<ref>{{cite journal |last1=Ilyushin |first1=Evgeniy |last2=Namiot |first2=Dmitry |date=2016 |title=On source-to-source compilers |url=https://cyberleninka.ru/article/n/on-source-to-source-compilers/pdf |journal=International Journal of Open Information Technologies |volume=4 |issue=5 |pages=48–51 |archive-url=https://web.archive.org/web/20220913223759/https://cyberleninka.ru/article/n/on-source-to-source-compilers/pdf |archive-date=13 September 2022 |access-date=September 14, 2022}}</ref>
* [[Bytecode]] compilers compile to assembly language of a theoretical machine, like some [[Prolog]] implementations
** This Prolog machine is also known as the [[Warren Abstract Machine]] (or WAM).
** Bytecode compilers for [[Java (programming language)|Java]], [[Python (programming language)|Python]] are also examples of this category.
* [[Just-in-time compilation|Just-in-time compilers]] (JIT compiler) defer compilation until runtime. JIT compilers exist for many modern languages including [[Python (programming language)|Python]], [[JavaScript]], [[Smalltalk]], [[Java (programming language)|Java]], Microsoft [[.NET Framework|.NET]]'s [[Common Intermediate Language]] (CIL) and others. A JIT compiler generally runs inside an interpreter. When the interpreter detects that a code path is "hot", meaning it is executed frequently, the JIT compiler will be invoked and compile the "hot" code for increased performance.
** For some languages, such as Java, applications are first compiled using a bytecode compiler and delivered in a machine-independent [[intermediate representation]]. A bytecode interpreter executes the bytecode, but the JIT compiler will translate the bytecode to machine code when increased performance is necessary.<ref>{{cite journal |author-last=Aycock |author-first=John |date=2003 |title=A Brief History of Just-in-Time |journal=ACM Comput. Surv. |volume=35 |issue=2 |pages=93–113 |doi=10.1145/857076.857077 |s2cid=15345671}}</ref>{{primary source inline|date=March 2017}}
* [[silicon compiler|Hardware compilers]] (also known as synthesis tools) are compilers whose input is a [[hardware description language]] and whose output is a description, in the form of a [[netlist]] or otherwise, of a hardware configuration.
** The output of these compilers target [[computer hardware]] at a very low level, for example a [[field-programmable gate array]] (FPGA) or structured [[application-specific integrated circuit]] (ASIC).<ref>{{cite book|last1=Swartz|first1=Jordan S.|last2=Betz |first2=Vaugh |last3 =Rose|first3=Jonathan|title=Proceedings of the 1998 ACM/SIGDA sixth international symposium on Field programmable gate arrays - FPGA '98 |chapter=A fast routability-driven router for FPGAs |___location=Monterey, CA|publisher=[[Association for Computing Machinery|ACM]]|chapter-url= http://www.eecg.toronto.edu/~vaughn/papers/fpga98.pdf |url-status=live|archive-url=https://web.archive.org/web/20170809012611/http://www.eecg.toronto.edu/~vaughn/papers/fpga98.pdf|archive-date=9 August 2017|date =22-25 February 1998|doi = 10.1145/275107.275134 |pages=140–149|isbn=978-0897919784|s2cid=7128364}}</ref>{{primary source inline|date=March 2017}} Such compilers are said to be hardware compilers, because the source code they compile effectively controls the final configuration of the hardware and how it operates. The output of the compilation is only an interconnection of [[transistor]]s or [[lookup table]]s.
** An example of hardware compiler is XST, the Xilinx Synthesis Tool used for configuring FPGAs.<ref>{{cite web|author=Xilinx Staff|date=2009|title=XST Synthesis Overview|publisher=Xilinx, Inc.|url=http://www.xilinx.com/support/documentation/sw_manuals/xilinx11/ise_c_using_xst_for_synthesis.htm|access-date=28 February 2017|url-status=live|archive-url=https://web.archive.org/web/20161102004019/http://www.xilinx.com/support/documentation/sw_manuals/xilinx11/ise_c_using_xst_for_synthesis.htm|archive-date=2 November 2016}}</ref>{{primary source inline|date=March 2017}} Similar tools are available from Altera,<ref>{{cite web|author=Altera Staff|date=2017|title=Spectra-Q™ Engine|publisher=Altera.com|url=https://www.altera.com/products/design-software/fpga-design/quartus-prime/features/spectra-q.html|access-date=28 February 2017|url-status=dead|archive-url=https://web.archive.org/web/20161010221724/https://www.altera.com/products/design-software/fpga-design/quartus-prime/features/spectra-q.html|archive-date=10 October 2016}}</ref>{{primary source inline|date=March 2017}} Synplicity, Synopsys and other hardware vendors.{{citation needed|date=March 2017}}
** Research systems compile subsets of high level serial languages, such as Python or C++, directly into parallelized digital logic. This is typically easier to do for functional languages or functional subsets of multi-paradigm languages.<ref>{{cite conference |last1=Jurkans |first1=K |last2=Fox |first2=C |title=Python Subset to Digital Logic Dataflow Compiler for Robots and IoT |conference=IEEE International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom-2023) |year=2023}}</ref>
* A program that translates from a low-level language to a higher level one is a [[decompiler]].<ref>{{cite book |doi=10.1016/B978-1-59749-574-5.00003-9 |chapter=Tools of the Trade |title=Managed Code Rootkits |date=2011 |last1=Metula |first1=Erez |pages=39–62 |isbn=978-1-59749-574-5 }}</ref>
* A program that translates into an object code format that is not supported on the compilation machine is called a [[cross compiler]] and is commonly used to prepare code for execution on embedded software applications.<ref>{{Cite web |last=Chandrasekaran |first=Siddharth |date=2018-01-26 |title=Cross Compilation Demystified |url=https://embedjournal.com/cross-compilation-demystified/ |access-date=2023-03-05 |website=embedjournal.com |language=en}}</ref>{{better source needed|date=March 2023}}
* A program that rewrites object code back into the same type of object code while applying optimisations and transformations is a [[binary recompiler]].
 
''Assemblers,'' which translate human readable [[assembly language]] to the [[machine code]] instructions executed by hardware, are not considered compilers.<ref>Calingaert and Horowitz 1979, pp. 186-187</ref>{{efn|"The many source-language features described in the preceding section result in a number of salient differences between compilers and assemblers. On any one item the distinction may not be clear-cut. Moreover, it may be difficult to distinguish a simple compiler from a powerful macro assembler. Nevertheless, the differences are usually substantial enough that there remains a qualitative distinction between assemblers and compilers."}} (The inverse program that translates machine code to assembly language is called a [[disassembler]].)
 
== See also ==
{{Portal|Computer programming}}
{{div col|colwidth=22em}}
* [[Abstract interpretation]]
* [[Assembly language#Assembler|Assembler]]
* [[Attribute grammar]]
* [[Bottom-up parsing]]
* [[Compiler-compilerCompile and go system]]
* [[ErrorCompile avalanchefarm]]
* [[History of compiler writing]]
* [[Just-in-time compilation]]
* [[Linker (computing)|Linker]]
* [[List of compilers]]
* [[List of important publications in computer science#Compilers]]
* [[Metacompilation]]
* [[SemanticsProgram encodingtransformation]]
{{div col end}}
* [[Compile farm]]
 
== Notes and references ==
{{reflist|2Notelist}}
{{Reflist|refs=
<ref name="Hellige_2004">{{cite book |editor-last=Hellige |editor-first=Hans Dieter |title=Geschichten der Informatik - Visionen, Paradigmen, Leitmotive. |language=de |publisher=[[Springer-Verlag]] |date=2004 |orig-date=November 2002 |edition=1 |isbn=978-3-540-00217-8 |id={{ISBN|3-540-00217-0}} |___location=Bremen, Germany |publication-place=Berlin / Heidelberg, Germany |doi=10.1007/978-3-642-18631-8 |pages=45, 104, 105}} (xii+514 pages)</ref>
<ref name="Rutishauser_1951">{{cite journal |author-last=Rutishauser |author-first=Heinz |author-link=Heinz Rutishauser |title=Über automatische Rechenplanfertigung bei programmgesteuerten Rechenanlagen |language=de |journal=[[Zeitschrift für Angewandte Mathematik und Mechanik]] |volume=31 |page=255 |date=1951 |doi=10.1002/zamm.19510310820}}</ref>
<ref name="Fothe-Wilke_2014">{{cite book |title=Keller, Stack und automatisches Gedächtnis – eine Struktur mit Potenzial |language=de |trans-title=Cellar, stack and automatic memory - a structure with potential |editor-first1=Michael |editor-last1=Fothe |editor-first2=Thomas |editor-last2=Wilke |type=Tagungsband zum Kolloquium 14. November 2014 in Jena |___location=Jena, Germany |volume=T-7 |series=GI Series: Lecture Notes in Informatics (LNI) – Thematics |publisher=[[Gesellschaft für Informatik]] (GI) / Köllen Druck + Verlag GmbH |isbn=978-3-88579-426-4 |issn=1614-3213 |date=2015 |orig-date=2014-11-14 |publication-place=Bonn, Germany |pages=20–21 |url=https://dl.gi.de/bitstream/handle/20.500.12116/4381/lni-t-7.pdf?sequence=1&isAllowed=y |access-date=2020-04-12 |url-status=live |archive-url=https://web.archive.org/web/20200412122706/https://dl.gi.de/bitstream/handle/20.500.12116/4381/lni-t-7.pdf?sequence=1&isAllowed=y |archive-date=2020-04-12}} [https://web.archive.org/web/20221210100112/https://dl.gi.de/handle/20.500.12116/4374/browse?type=title&sort_by=4] (77 pages)</ref>
}}
 
== ReferencesFurther reading ==
{{refbegin|2Refbegin}}
* {{cite book |author-link1=Alfred V. Aho |author-last1=Aho |author-first1=Alfred V. |author-link2 = Ravi Sethi |author-last2=Sethi |author-first2=Ravi |author-link3=Jeffrey D. Ullman |author-last3=Ullman |author-first3=Jeffrey D. |title=Compilers: Principles, Techniques, and Tools |isbn=9780201100884 |publisher=[[Addison-Wesley]] |date=1986 |edition=1st |title-link=Compilers: Principles, Techniques, and Tools}}
* [http://www.informatik.uni-trier.de/~ley/db/books/compiler/index.html Compiler textbook references] A collection of references to mainstream Compiler Construction Textbooks
* {{cite journal |author-link=Frances E. Allen |author-last=Allen |author-first=Frances E. |title=A History of Language Processor Technology in IBM |journal=IBM Journal of Research and Development |volume=25 |pages=535–548 |number=5 |date=September 1981 |publisher=[[IBM]] |doi=10.1147/rd.255.0535}}
* [[Alfred V. Aho|Aho, Alfred V.]]; [[Ravi Sethi|Sethi, Ravi]]; and [[Jeffrey D. Ullman|Ullman, Jeffrey D.]], ''[[Compilers: Principles, Techniques, and Tools|Compilers: Principles, Techniques and Tools]]'' (ISBN 0-201-10088-6) [http://www.aw.com/catalog/academic/product/0,4096,0201100886,00.html link to publisher]. Also known as “The Dragon Book.”
* {{cite book |author-last1=Allen |author-first1=Randy |author-link2=Ken Kennedy (computer scientist) |author-last2=Kennedy |author-first2=Ken |title=Optimizing Compilers for Modern Architectures |publisher=[[Morgan Kaufmann Publishers]] |date=2001 |isbn=978-1-55860-286-1}}
* [[Frances E. Allen|Allen, Frances E.]], [http://www.research.ibm.com/journal/rd/255/ibmrd2505Q.pdf "A History of Language Processor Technology in IBM"], ''IBM Journal of Research and Development'', v.25, no.5, September 1981.
* {{cite book |author-link=Andrew Appel |author-last=Appel |author-first=Andrew Wilson |title=Modern Compiler Implementation in Java |edition=2nd |publisher=[[Cambridge University Press]] |date=2002 |isbn=978-0-521-82060-8}}
* Allen, Randy; and [[Ken Kennedy (computer scientist)|Kennedy, Ken]], ''Optimizing Compilers for Modern Architectures'', [[Morgan Kaufmann Publishers]], 2001. ISBN 1-55860-286-0
* {{cite book |author-link=Andrew Appel |author-last=Appel |author-first=Andrew Wilson |url=https://books.google.com/books?id=8APOYafUt-oC |title=Modern Compiler Implementation in ML |publisher=[[Cambridge University Press]] |date=1998 |isbn=978-0-521-58274-2}}
* [[Andrew Appel|Appel, Andrew Wilson]]
* {{cite book |author-last=Bornat |author-first=Richard |title=Understanding and Writing Compilers: A Do It Yourself Guide |date=1979 |publisher=[[Macmillan Publishing]] |isbn=978-0-333-21732-0 |url=http://www.cs.mdx.ac.uk/staffpages/r_bornat/books/compiling.pdf |author-link=Richard Bornat |access-date=11 April 2007 |archive-date=15 June 2007 |archive-url=https://web.archive.org/web/20070615132948/http://www.cs.mdx.ac.uk/staffpages/r_bornat/books/compiling.pdf |url-status=dead}}
** ''Modern Compiler Implementation in Java'', 2nd edition. [[Cambridge University Press]], 2002. ISBN 0-521-82060-X
* {{cite book |title=Assemblers, Compilers, and Program Translation |author-first=Peter |author-last=Calingaert |editor-first=Ellis |editor-last=Horowitz |editor-link=Ellis Horowitz |date=1979 |series=Computer software engineering series |publisher=[[Computer Science Press, Inc.]] |___location=Potomac, Maryland |edition=1st printing, 1st |isbn=0-914894-23-4 |issn=0888-2088 |lccn=78-21905 |url=https://archive.org/details/assemblerscompil00cali |url-access=registration |access-date=2020-03-20}} (2+xiv+270+6 pages)
** [http://books.google.com/books?id=8APOYafUt-oC&printsec=frontcover ''Modern Compiler Implementation in ML''], Cambridge University Press, 1998. ISBN 0521582741
* {{cite book |title=Engineering a compiler |author-last1=Cooper |author-first1=Keith Daniel |author-last2=Torczon |author-first2=Linda |date=2012 |publisher=Elsevier/Morgan Kaufmann |isbn=978-0-12088478-0 |edition=2nd |___location=Amsterdam, Netherlands |pages=8 |oclc=714113472}}
* [[Richard Bornat|Bornat, Richard]], [http://www.cs.mdx.ac.uk/staffpages/r_bornat/books/compiling.pdf ''Understanding and Writing Compilers: A Do It Yourself Guide''], Macmillan Publishing, 1979. ISBN 0-333-21732-2
* {{cite book |author-last=Gries |author-first=David |author-link=David Gries
* Cooper, Keith D., and Torczon, Linda, '' Engineering a Compiler'', Morgan Kaufmann, 2004, ISBN 1-55860-699-8.
|date=1971 |title=Compiler Construction for Digital Computers |publisher=John Wiley and Sons |___location=New York |isbn=0-471-32776-X |language=English, Spanish, Japanese, Chinese, Italian, Russian |quote=The first text on compiler construction.}}
* Leverett; Cattel; Hobbs; Newcomer; Reiner; Schatz; Wulf, ''An Overview of the Production Quality Compiler-Compiler Project'', in ''[[Computer (magazine)|Computer]]'' 13(8):38-49 (August 1980)
* {{cite book |author-last1=McKeeman, |author-first1=William Marshall; [[|author-link=William M. McKeeman |author-link2=Jim Horning |author-last2=Horning, |author-first2=James J.]]; |author-last3=Wortman, |author-first3=David B., [http|url=https://wwwarchive.cs.toronto.eduorg/XPLdetails/compilergenerato00mcke ''|title=A Compiler Generator''], |___location=[[Englewood Cliffs, N.J. :NJ]] |publisher=[[Prentice-Hall,]] |date=1970. ISBN 0131550772|isbn=978-0-13-155077-3}}
* [[{{cite book |author-link=Steven Muchnick |author-last1=Muchnick, |author-first1=Steven]], [http|url=https://booksarchive.google.comorg/books?id=Pq7pHwG1_OkC&printsec=frontcover&source=gbs_summary_r&caddetails/advancedcompiler00much |url-access=0registration ''|title=Advanced Compiler Design and Implementation''], |publisher=[[Morgan Kaufmann Publishers,]] |date=1997. ISBN |isbn=978-1-55860-320-42}}
* [[{{cite book |author-link=Michael L. Scott |author-last=Scott, |author-first=Michael Lee]], [http|url=https://books.google.com/books?id=4LMtA2wOsPcC&printsec |title=frontcover&dq=Programming+Language+Pragmatics ''Programming Language Pragmatics''], |publisher=[[Morgan Kaufmann,]] |date=2005, |edition=2nd edition, 912 pages. ISBN |isbn=978-0-12-633951-1 ([http://www.cs.rochester.edu/~scott/pragmatics/ The author's site on this book]).2}}
* {{cite book |author-last1=Srikant, |author-first1=Y. N.; |author-last2=Shankar, |author-first2=Priti, [http|url=https://books.google.com/books?id=0K_jIsgyNpoC&printsec=frontcover ''|title=The Compiler Design Handbook: Optimizations and Machine Code Generation''], |publisher=[[CRC Press,]] |date=2003. ISBN 084931240X|isbn=978-0-8493-1240-3}}
* {{cite book |author-last=Terry, |author-first=Patrick D., [|url=http://scifac.ru.ac.za/compilers/conts.htm ''|title=Compilers and Compiler Generators: An Introduction with C++''], |publisher=International Thomson Computer Press, |date=1997. ISBN 1850322988,|isbn=978-1-85032-298-6}}
* [[{{cite book |author-link=Niklaus Wirth |author-last=Wirth, |author-first=Niklaus]], [|url=http://www.oberon2005ethoberon.ruethz.ch/bookWirthPubl/ccnw2005eCBEAll.pdf'' |title=Compiler Construction''] (ISBN |isbn=978-0-201-40353-6),4 |publisher=[[Addison-Wesley,]] |date=1996, 176|access-date=24 pagesApril 2012 |archive-url=https://web.archive.org/web/20170217071020/http://www.ethoberon.ethz.ch/WirthPubl/CBEAll.pdf Revised|archive-date=17 NovemberFebruary 2005.2017 |url-status=dead}}
* {{cite web |author=LLVM community |title=The LLVM Target-Independent Code Generator |url=http://llvm.org/docs/CodeGenerator.html#built-in-register-allocators |website=LLVM Documentation |access-date=17 June 2016}}
* [https://web.archive.org/web/20150103161301/http://www.informatik.uni-trier.de/~ley/db/books/compiler/index.html Compiler textbook references] A collection of references to mainstream Compiler Construction Textbooks
{{refend}}
 
== External links ==
{{Wiktionary|compiler}}
{{Wikipedia-BooksWikibooks|Compiler constructionConstruction}}
{{Commons category|Compilers}}
{{Wikibooks|Compiler construction}}
* [http://scheme2006.cs.uchicago.edu/11-ghuloum.pdf Incremental Approach to Compiler Construction]{{snd}}a PDF tutorial
* [http://compilers.iecc.com/ The comp.compilers newsgroup and RSS feed]
* {{webarchive |url=https://web.archive.org/web/20180515111448/http://www.diku.dk:80/hjemmesider/ansatte/torbenm/Basics/|date=15 May 2018|title=Basics of Compiler Design}}
* [http://www.jiscmail.ac.uk/lists/hwcomp.html Hardware compilation mailing list]
* {{YouTube|_C5AHaS1mOA|Short animation}} explaining the key conceptual difference between compilers and interpreters
* [http://www.onyxbits.de/content/blog/patrick/introduction-compiler-construction-using-flex-and-yacc Practical introduction to compiler construction using flex and yacc]
* {{YouTube|id=QPCC2sbukeo|title=Syntax Analysis & LL1 Parsing}}
* [http://compilers.iecc.com/crenshaw/ Let's Build a Compiler], by Jack Crenshaw
* {{webarchive |url=https://web.archive.org/web/20141010102940/http://www.compdev.net/|date=10 October 2014|title=Forum about compiler development}}
 
{{Authority control}}
{{Cat also|Compiler| Computer libraries | Programming language implementation}}
 
{{Computer science}}
[[Category:Compilers|*]]
[[Category:Compiler theory]]
 
[[Category:American inventions]]
[[af:Vertalerkonstruksie]]
[[Category:Compilers| ]]
[[ar:مصرف (برمجة)]]
[[Category:Computer libraries]]
[[an:Compilador]]
[[Category:Programming language implementation]]
[[ast:Compilador]]
[[Category:Utility software types]]
[[be:Кампілятар]]
[[bs:Kompajler]]
[[bg:Компилатор]]
[[ca:Compilador]]
[[cs:Překladač]]
[[da:Compiler]]
[[de:Compiler]]
[[et:Kompilaator]]
[[el:Μεταγλωττιστής]]
[[es:Compilador]]
[[eo:Tradukilo]]
[[eu:Konpiladore]]
[[fa:مترجم (رایانه)]]
[[fr:Compilateur]]
[[gl:Compilador]]
[[ko:컴파일러]]
[[hi:कम्पाइलर]]
[[hsb:Kompilator]]
[[hr:Jezični prevoditelj]]
[[id:Kompilator]]
[[is:Þýðandi (tölvunarfræði)]]
[[it:Compilatore]]
[[he:מהדר]]
[[ka:კომპილატორი]]
[[lv:Kompilators]]
[[lt:Kompiliatorius]]
[[hu:Fordítóprogram]]
[[mk:Компајлер]]
[[ml:കംപൈലര്‍]]
[[ms:Penyusun]]
[[nl:Compiler]]
[[ja:コンパイラ]]
[[no:Kompilator]]
[[pl:Kompilator]]
[[pt:Compilador]]
[[ro:Compilator]]
[[ru:Компилятор]]
[[simple:Compiler]]
[[sk:Kompilátor (programovanie)]]
[[sl:Prevajalnik]]
[[sr:Компилатор]]
[[fi:Ohjelmointikielen kääntäjä]]
[[sv:Kompilator]]
[[ta:நிரல்மொழிமாற்றி]]
[[te:కంపైలర్]]
[[th:โปรแกรมแปลโปรแกรม]]
[[tr:Derleyici]]
[[uk:Компілятор]]
[[vi:Trình biên dịch]]
[[yi:קאמפיילער]]
[[zh:編譯器]]