Recursive descent parser: Difference between revisions

Content deleted Content added
Arto B (talk | contribs)
 
(267 intermediate revisions by more than 100 users not shown)
Line 1:
{{short description|Algorithm}}
A '''recursive descent parser''' is a top-down [[parsing|parser]] built from a set of [[Mutual recursion|mutually-recursive]] procedures (or a non-recursive equivalent) where each such [[procedure]] usually implements one of the production rules of the [[formal grammar|grammar]]. Thus the structure of the resulting program closely mirrors that of the grammar it recognises.
{{More footnotes|date=February 2009}}
 
AIn [[computer science]], a '''recursive descent parser''' is a kind of [[top-down [[parsing|top-down parser]] built from a set of [[Mutualmutual recursion|mutually- recursive]] procedures (or a non-recursive equivalent) where each such [[procedure (computer science)|procedure]] usually implements one of the production[[Terminal rulesand nonterminal symbols|nonterminals]] of the [[formal grammar|grammar]]. Thus the structure of the resulting program closely mirrors that of the grammar it recognisesrecognizes.<ref>{{FOLDOC|Recursive+descent+parser}}</ref><ref>{{cite book | title=Recursive Programming Techniques | author=Burge, W.H. | year=1975 | publisher=Addison-Wesley Publishing Company | isbn=0-201-14450-6 | url-access=registration | url=https://archive.org/details/recursiveprogram0000burg }}</ref>
A '''predictive parser''' is a recursive descent parser with no backup. Predictive parsing is possible only for the class of [[LL parser|LL(k)]] grammars, which are the context-free grammars for which there exists some positive integer k that allows a recursive descent parser to decide which production to use by examining only the next k tokens of input. (The [[LL parser|LL(k)]] grammars therefore exclude all ambiguous grammars, as well as all grammars that contain left recursion. Any context-free grammar can be transformed into an equivalent grammar that has no left recursion, but removal of left recursion does not always yield an [[LL parser|LL(k)]] grammar.) A predictive parser runs in [[linear time]].
 
A '''predictive parser''' is a recursive descent parser withthat nodoes backupnot require [[backtracking]].<ref name="Watson2017">{{cite book|last=Watson|first=Des|title=A Practical Approach to Compiler Construction|url=https://books.google.com/books?id=05B0DgAAQBAJ&q=%22predictive+parser%22|date=22 March 2017|publisher=Springer|isbn=978-3-319-52789-5}}</ref> Predictive parsing is possible only for the class of [[LL parser|LL(''k'')]] grammars, which are the [[context-free grammarsgrammar]]s for which there exists some positive integer ''k'' that allows a recursive descent parser to decide which production to use by examining only the next ''k'' tokens of input. (The [[LL parser|LL(''k'')]] grammars therefore exclude all [[ambiguous grammarsgrammar]]s, as well as all grammars that contain [[left recursion]]. Any context-free grammar can be transformed into an equivalent grammar that has no left recursion, but removal of left recursion does not always yield an [[LL parser|LL(''k'')]] grammar.) A predictive parser runs in [[linear time]].
Recursive descent with backup is a technique that determines which production to use by trying each production in turn. Recursive descent with backup is not limited to [[LL parser|LL(k)]] grammars, but is not guaranteed to terminate unless the grammar is [[LL parser|LL(k)]]. Even when they terminate, parsers that use recursive descent with backup may require [[exponential time]].
 
Recursive descent with backupbacktracking is a technique that determines which [[Production rule (formal languages)|production]] to use by trying each production in turn. Recursive descent with backupbacktracking is not limited to [[LL parser|LL(''k'')]] grammars, but is not guaranteed to terminate unless the grammar is [[LL parser|LL(''k'')]]. Even when they terminate, parsers that use recursive descent with backupbacktracking may require [[exponential time]].
Although predictive parsers are widely used, programmers often prefer to create [[LR parser|LR]] or [[LALR parser|LALR]] parsers via parser generators without transforming the grammar into [[LL parser|LL(k)]] form.
 
Although predictive parsers are widely used, and are frequently chosen if writing a parser by hand, programmers often prefer to use a table-based parser produced by a [[parser generator]],{{Citation needed|date=February 2018}} either for an LL(''k'') language or using an alternative parser, such as [[LALR parser|LALR]] or [[LR parser|LR]]. This is particularly the case if a grammar is not in [[LL parser|LL(''k'')]] form, as transforming the grammar to LL to make it suitable for predictive parsing is involved. Predictive parsers can also be automatically generated, using tools like [[ANTLR]].
Some authors define the recursive descent parsers as the predictive parsers. Other authors use the term more broadly, to include backed-up recursive descent.{{citationneeded}}
 
Predictive parsers can be depicted using transition diagrams for each non-terminal symbol where the edges between the initial and the final states are labelled by the symbols (terminals and non-terminals) of the right side of the production rule.<ref>{{cite book|last1=Aho|first1=Alfred V.|last2=Sethi|first2=Ravi|last3=Ullman|first3=Jeffrey|authorlink1=Alfred V. Aho|authorlink3=Jeffrey Ullman|title=Compilers: Principles, Techniques and Tools|url=https://archive.org/details/compilers00ahoa|url-access=limited|date=1986|publisher=Addison Wesley|page=[https://archive.org/details/compilers00ahoa/page/n193 183]|edition=first}}</ref>
== Example parser ==
The following [[EBNF]] [[grammar]] (for [[Niklaus Wirth]]'s [[PL/0]] programming language, from
[[Algorithms + Data Structures = Programs]]) is in [[LL parser|LL(1)]] form (for simplicity, ''ident''
and ''number'' are assumed to be terminals):
 
== Example parser ==
<pre>
The following [[Extended Backus–Naur Form|EBNF]]-like [[formal grammar|grammar]] (for [[Niklaus Wirth]]'s [[PL/0]] programming language, from ''[[Algorithms + Data Structures = Programs]]'') is in [[LL parser|LL(1)]] form:
 
<syntaxhighlight lang="ebnf">
program = block "." .
Line 23 ⟶ 24:
statement =
[ident ":=" expression
| "call" ident
| "begin" statement {";" statement } "end"
| "if" condition "then" statement
| "while" condition "do" statement .
] .
condition =
"odd" expression
| expression ("="|"#"|"<"|"<="|">"|">=") expression .
.
expression = ["+"|"-"] term {("+"|"-") term} .
Line 39 ⟶ 38:
term = factor {("*"|"/") factor} .
factor =
ident
| number
| "(" expression ")" .
</syntaxhighlight>
</pre>
 
[[Terminalterminal symbol|Terminals]] are expressed in quotes. Each [[nonterminal symbol|nonterminal]] is defined by a rule in the grammar, (except for ''ident'' and ''number'', which are assumed to be implicitly defined.
and ''number''). Each [[nonterminal]] is defined by a rule in the grammar.
 
===C implementation===
Notice how closely the predictive parser below mirrors the grammar above. There is a procedure for each nonterminal in the grammar. Parsing descends in a top-down manner, until the final nonterminal has been processed. The program fragment depends on a global variable, ''sym'', which contains the next symbol from the input, and the global function ''getsym'', which updates ''sym'' when called.
 
What follows is an implementation of a recursive descent parser for the above language in [[C (programming language)|C]]. The parser reads in source code, and exits with an error message if the code fails to parse, exiting silently if the code parses correctly.
<pre>
 
Notice how closely the predictive parser below mirrors the grammar above. There is a procedure for each nonterminal in the grammar. Parsing descends in a top-down manner, until the final nonterminal has been processed. The program fragment depends on a global variable, ''sym'', which contains the nextcurrent symbol from the input, and the global function ''getsymnextsym'', which updates ''sym'' when called.
 
The implementations of the functions ''nextsym'' and ''error'' are omitted for simplicity.
 
<syntaxhighlight lang="c">
typedef enum {ident, number, lparen, rparen, times, slash, plus,
minus, eql, neq, lss, leq, gtr, geq, callsym, beginsym, semicolon,
Line 54 ⟶ 61:
 
Symbol sym;
void getsymnextsym(void);
void error(const char msg[]);
void expression(void);
 
int accept(Symbol s) {
if (sym == s) {
getsymnextsym();
return 1;
}
Line 83 ⟶ 89:
} else {
error("factor: syntax error");
getsymnextsym();
}
}
Line 90 ⟶ 96:
factor();
while (sym == times || sym == slash) {
getsymnextsym();
factor();
}
Line 97 ⟶ 103:
void expression(void) {
if (sym == plus || sym == minus)
getsymnextsym();
term();
while (sym == plus || sym == minus) {
getsymnextsym();
term();
}
Line 110 ⟶ 116:
} else {
expression();
if (sym == eql || sym == neq || sym == lss || sym == leq || sym == gtr || sym == geq) {
sym == leq || sym == gtr || sym == geqnextsym() {;
getsym();
expression();
} else {
error("condition: invalid operator");
getsymnextsym();
}
}
Line 140 ⟶ 145:
expect(dosym);
statement();
} ]else .{
error("statement: syntax error");
getsymnextsym();
}
}
Line 168 ⟶ 176:
 
void program(void) {
getsymnextsym();
block();
expect(period);
}
</syntaxhighlight>
</pre>
 
== See alsoExamples ==
== Implementation in functional languages ==
*[[ANTLR]] - aSome recursive descent parser generatorgenerators:
Recursive descent parsers are particularly easy to implement in functional languages such as [[Haskell programming language|Haskell]]. See [http://www.cs.nott.ac.uk/~gmh/pearl.pdf Functional Pearls: Monadic Parsing in Haskell] (pdf format).
*[[TMG (language)|TMG]] – an early compiler-compiler used in the 1960s and early 1970s
*[[JavaCC]]
*[[Coco/R]]
*[[ANTLR]]
*[[Spirit Parser Framework]] – a C++ recursive descent parser generator framework requiring no pre-compile step
*[[parboiled (Java)]] – a recursive descent [[Parsing expression grammar|PEG]] parsing library for [[Java (programming language)|Java]]
 
The C++ front-end of the [[Clang]] compiler contains a hand-written parser based on the recursive-descent parsing algorithm. <ref>How Clang handles the type / variable name ambiguity of C/C++
== See also ==
https://eli.thegreenplace.net/2012/07/05/how-clang-handles-the-type-variable-name-ambiguity-of-cc/</ref>
*[[Coco/R]] - a recursive descent parser generator
*[[ANTLR]] - a recursive descent parser generator
*[http://www.codecodex.com/wiki/index.php?title=Recursive_descent_parsing CodeCodex: Recursive descent parsing] - sample recursive descent parser generator code for C, Java and OCaml
 
==See References also==
*[[Parser combinator]] – a higher-order function used in combinatory parsing, a method of factoring recursive descent parser designs
* The [[Dragon book]], Alfred V Aho, Ravi Sethi, and Jeffrey D Ullman, in particular Section 4.4.
*[[Parsing expression grammar]] – another form representing recursive descent grammar
* ''Modern Compiler Implementation in Java, Second Edition'', Andrew Appel, 2002, ISBN 0-521-82060-X.
*[[Recursive ascent parser]]
* ''Recursive Programming Techniques'', W.H. Burge, 1975, ISBN 0-201-14450-6
*[[Coco/RTail recursive parser]] - a variant of the recursive descent parser generator
* ''Crafting a Compiler with C'', Charles N Fischer and Richard J LeBlanc, Jr, 1991, ISBN 0-8053-2166-7.
* [http://search.cpan.org/~dconway/Parse-RecDescent-1.94/lib/Parse/RecDescent.pod Parse::RecDescent]: A versatile recursive descent [[Perl]] module.
* [http://pyparsing.sourceforge.net/ pyparsing]: A versatile recursive descent [[Python (programming language)|Python]] module.
* ''Compiling with C# and Java'', Pat Terry, 2005, ISBN/ISSN: 0-321-26360-X 624
* ''Algorithms + Data Structures = Programs'', Niklaus Wirth, 1975, ISBN 0-13-022418-9
* ''Compiler Construction'', Niklaus Wirth, 1996, ISBN 0-201-40353-6
{{FOLDOC}}
 
==References==
[[Category:Parsing algorithms]]
{{Reflist}}
[[Category:Compilers|*]]
[[Category:Parsers]]
 
===General references===
[[de:Rekursiver Abstieg]]
* The ''[[DragonCompilers: bookPrinciples, Techniques, and Tools]]'', first edition, Alfred V Aho, Ravi Sethi, and Jeffrey D Ullman, in particular Section 4.4.
[[ko:되부름 하향 구문 분석]]
* ''Modern Compiler Implementation in Java, Second Edition'', Andrew Appel, 2002, {{ISBN |0-521-82060-X}}.
* ''Recursive Programming Techniques'', W.H. Burge, 1975, {{ISBN |0-201-14450-6}}
* ''Crafting a Compiler with C'', Charles N Fischer and Richard J LeBlanc, Jr, 1991, {{ISBN |0-8053-2166-7}}.
* ''Compiling with C# and Java'', Pat Terry, 2005, {{ISBN/ISSN: |0-321-26360-X}}, 624
* ''Algorithms + Data Structures = Programs'', Niklaus Wirth, 1975, {{ISBN |0-13-022418-9}}
* ''Compiler Construction'', Niklaus Wirth, 1996, {{ISBN |0-201-40353-6}}
 
==External links==
*[https://compilers.iecc.com/crenshaw/ Jack W. Crenshaw: ''Let's Build A Compiler'' (1988-1995)], in [[Pascal (programming language)|Pascal]], with [[assembly language]] output, using a "keep it simple" approach
 
{{Parsers}}
 
[[Category:Parsing algorithms]]
[[Category:Articles with example C code]]