Recursive descent parser: Difference between revisions

Content deleted Content added
References: fix typo
 
(292 intermediate revisions by more than 100 users not shown)
Line 1:
{{short description|Algorithm}}
A '''recursive descent parser''' is a top-down [[parser]] built from a set of [[Mutual recursion|mutually-recursive]] procedures (or a non-recursive equivalent) where each such [[procedure]] usually implements one of the production rules of the [[formal grammar|grammar]]. Thus the structure of the resulting program closely mirrors that of the grammar it recognises.
{{More footnotes|date=February 2009}}
 
In [[computer science]], a '''recursive descent parser''' is a kind of [[top-down parsing|top-down parser]] built from a set of [[mutual recursion|mutually recursive]] procedures (or a non-recursive equivalent) where each such [[procedure (computer science)|procedure]] implements one of the [[Terminal and nonterminal symbols|nonterminals]] of the [[formal grammar|grammar]]. Thus the structure of the resulting program closely mirrors that of the grammar it recognizes.<ref>{{FOLDOC|Recursive+descent+parser}}</ref><ref>{{cite book | title=Recursive Programming Techniques | author=Burge, W.H. | year=1975 | publisher=Addison-Wesley Publishing Company | isbn=0-201-14450-6 | url-access=registration | url=https://archive.org/details/recursiveprogram0000burg }}</ref>
Recursive descent parsers recognize the class of [[LL parser|LL(k)]] grammars for unbounded ''k''. In less formal terms, they recognize all grammars except those that are [[left recursion|left recursive]]. The running time of a recursive descent parser in the general case is [[exponential time|exponential]], although there is a modification to the algorithm called a [[packrat parser]], which trades memory for [[linear time]]. For a large class of grammars, it is also possible to eliminate left recursion and backtracking.
 
A ''predictive parser'' is a recursive descent parser that does not require [[backtracking]].<ref name="Watson2017">{{cite book|last=Watson|first=Des|title=A Practical Approach to Compiler Construction|url=https://books.google.com/books?id=05B0DgAAQBAJ&q=%22predictive+parser%22|date=22 March 2017|publisher=Springer|isbn=978-3-319-52789-5}}</ref> Predictive parsing is possible only for the class of [[LL parser|LL(''k'')]] grammars, which are the [[context-free grammar]]s for which there exists some positive integer ''k'' that allows a recursive descent parser to decide which production to use by examining only the next ''k'' tokens of input. The LL(''k'') grammars therefore exclude all [[ambiguous grammar]]s, as well as all grammars that contain [[left recursion]]. Any context-free grammar can be transformed into an equivalent grammar that has no left recursion, but removal of left recursion does not always yield an LL(''k'') grammar. A predictive parser runs in [[linear time]].
Recursive descent parsers are used less often in practice than [[LR parser|LR]] or [[LALR parser|LALR]] parsers which are easier to write using parser generators. However, recursive descent parsers have the advantage that they can pass information down to subrules, allowing for parameterized [[nonterminal]]s. This allows these parsers to match a certain class of non-[[context-free grammar|context-free]] languages.
 
Recursive descent with backtracking is a technique that determines which [[Production rule (formal languages)|production]] to use by trying each production in turn. Recursive descent with backtracking is not limited to LL(''k'') grammars, but is not guaranteed to terminate unless the grammar is LL(''k''). Even when they terminate, parsers that use recursive descent with backtracking may require [[exponential time]].
== Example parser ==
Consider the following [[formal grammar|grammar]] in [[Backus-Naur form|BNF]]:
 
Although predictive parsers are widely used, and are frequently chosen if writing a parser by hand, programmers often prefer to use a table-based parser produced by a [[parser generator]],{{Citation needed|date=February 2018}} either for an LL(''k'') language or using an alternative parser, such as [[LALR parser|LALR]] or [[LR parser|LR]]. This is particularly the case if a grammar is not in [[LL parser|LL(''k'')]] form, as transforming the grammar to LL to make it suitable for predictive parsing is involved. Predictive parsers can also be automatically generated, using tools like [[ANTLR]].
: <E> ::= NOT <E> | ( <E> <F> ) | TRUE | FALSE
: <F> ::= AND <E> | OR <E>
 
Predictive parsers can be depicted using transition diagrams for each non-terminal symbol where the edges between the initial and the final states are labelled by the symbols (terminals and non-terminals) of the right side of the production rule.<ref>{{cite book|last1=Aho|first1=Alfred V.|last2=Sethi|first2=Ravi|last3=Ullman|first3=Jeffrey|authorlink1=Alfred V. Aho|authorlink3=Jeffrey Ullman|title=Compilers: Principles, Techniques and Tools|url=https://archive.org/details/compilers00ahoa|url-access=limited|date=1986|publisher=Addison Wesley|page=[https://archive.org/details/compilers00ahoa/page/n193 183]|edition=first}}</ref>
We define a procedure <tt>parse_''A''()</tt> for every nonterminal ''A'' that parses a string in the language of ''A''. In this procedure it is first determined, with the current symbol in the input stream, which rule for the nonterminal will be used. Then it simply calls the procedures <tt>read_ch(''a'')</tt> and <tt>parse_''A''()</tt> for every terminal ''a'' and nonterminal ''A'' in the right-hand side for the rule.
 
==Example parser==
'''procedure''' parse_E() {
The following [[Extended Backus–Naur Form|EBNF]]-like [[formal grammar|grammar]] (for [[Niklaus Wirth]]'s [[PL/0]] programming language, from ''[[Algorithms + Data Structures = Programs]]'') is in [[LL parser|LL(1)]] form:
'''if''' ''ch'' = 'N' { read_ch('N') read_ch('O') read_ch('T') parse_E() }
'''else if''' ''ch'' = '(' { read_ch('(') parse_E() parse_F() read_ch(')') }
'''else if''' ''ch'' = 'T' { read_ch('T') read_ch('R') read_ch('U') read_ch('E') }
'''else if''' ''ch'' = 'F' { read_ch('F') read_ch('A') read_ch('L') read_ch('S') read_ch('E') }
}
 
<syntaxhighlight lang="ebnf">
'''procedure''' parse_F() {
program = block "." .
'''if''' ''ch'' = 'A' { read_ch('A') read_ch('N') read_ch('D') parse_E() }
'''else if''' ''ch'' = 'O' { read_ch('O') read_ch('R') parse_E() }
block =
}
["const" ident "=" number {"," ident "=" number} ";"]
["var" ident {"," ident} ";"]
{"procedure" ident ";" block ";"} statement .
statement =
ident ":=" expression
| "call" ident
| "begin" statement {";" statement } "end"
| "if" condition "then" statement
| "while" condition "do" statement .
condition =
"odd" expression
| expression ("="|"#"|"<"|"<="|">"|">=") expression .
expression = ["+"|"-"] term {("+"|"-") term} .
term = factor {("*"|"/") factor} .
factor =
ident
| number
| "(" expression ")" .
</syntaxhighlight>
 
[[terminal symbol|Terminals]] are expressed in quotes. Each [[nonterminal symbol|nonterminal]] is defined by a rule in the grammar, except for ''ident'' and ''number'', which are assumed to be implicitly defined.
These procedures use a global variable ''ch'' that contains the current first character in the input stream. The procedure <tt>read_ch(''a'')</tt> reports an error if ''ch'' is not equal to ''a'' and otherwise reads the next character from the input stream into ''ch''.
 
===C implementation===
To determine which rule should be applied in case of a certain terminal the same algorithm can be used as the one for constructing the parsing table of an [[LL parser]].
 
What follows is an implementation of a recursive descent parser for the above language in [[C (programming language)|C]]. The parser reads in source code, and exits with an error message if the code fails to parse, exiting silently if the code parses correctly.
== Formalizing recursive descent parsers ==
Although [[context-free grammar]]s are commonly used to [[formal|formalize]] the syntax of the language recognized by a recursive descent parser (as in the example above), an alternate and more direct way to formalize recursive descent parsing is via [[parsing expression grammar]]s, which model the structure and behavior of typical recursive descent parsers directly.
 
Notice how closely the predictive parser below mirrors the grammar above. There is a procedure for each nonterminal in the grammar. Parsing descends in a top-down manner until the final nonterminal has been processed. The program fragment depends on a global variable, ''sym'', which contains the current symbol from the input, and the function ''nextsym'', which updates ''sym'' when called.
== Implementation in functional languages ==
Recursive descent parsers are particularly easy to implement in functional languages such as [[Haskell programming language|Haskell]]. See [http://www.cs.nott.ac.uk/~gmh/pearl.pdf Functional Pearls: Monadic Parsing in Haskell] (pdf format).
 
The implementations of the functions ''nextsym'' and ''error'' are omitted for simplicity.
== References ==
* ''Recursive Programming Techniques'', W.H. Burge, 1975, ISBN 0-201-14450-6
* The [[Dragon book]], in particular Section 4.4
* [http://search.cpan.org/~dconway/Parse-RecDescent-1.94/lib/Parse/RecDescent.pod Parse::RecDescent]: A versatile recursive descent [[Perl]] module.
* [http://pyparsing.sourceforge.net/ pyparsing]: A versatile recursive descent [[Python (programming language)|Python]] module.
{{FOLDOC}}
 
<syntaxhighlight lang="c">
[[Category:Parsing algorithms]]
typedef enum {ident, number, lparen, rparen, times, slash, plus,
minus, eql, neq, lss, leq, gtr, geq, callsym, beginsym, semicolon,
endsym, ifsym, whilesym, becomes, thensym, dosym, constsym, comma,
varsym, procsym, period, oddsym} Symbol;
 
Symbol sym;
[[de:Rekursiver Abstieg]]
void nextsym(void);
void error(const char msg[]);
 
int accept(Symbol s) {
if (sym == s) {
nextsym();
return 1;
}
return 0;
}
 
int expect(Symbol s) {
if (accept(s))
return 1;
error("expect: unexpected symbol");
return 0;
}
 
void factor(void) {
if (accept(ident)) {
;
} else if (accept(number)) {
;
} else if (accept(lparen)) {
expression();
expect(rparen);
} else {
error("factor: syntax error");
nextsym();
}
}
 
void term(void) {
factor();
while (sym == times || sym == slash) {
nextsym();
factor();
}
}
 
void expression(void) {
if (sym == plus || sym == minus)
nextsym();
term();
while (sym == plus || sym == minus) {
nextsym();
term();
}
}
 
void condition(void) {
if (accept(oddsym)) {
expression();
} else {
expression();
if (sym == eql || sym == neq || sym == lss || sym == leq || sym == gtr || sym == geq) {
nextsym();
expression();
} else {
error("condition: invalid operator");
nextsym();
}
}
}
 
void statement(void) {
if (accept(ident)) {
expect(becomes);
expression();
} else if (accept(callsym)) {
expect(ident);
} else if (accept(beginsym)) {
do {
statement();
} while (accept(semicolon));
expect(endsym);
} else if (accept(ifsym)) {
condition();
expect(thensym);
statement();
} else if (accept(whilesym)) {
condition();
expect(dosym);
statement();
} else {
error("statement: syntax error");
nextsym();
}
}
 
void block(void) {
if (accept(constsym)) {
do {
expect(ident);
expect(eql);
expect(number);
} while (accept(comma));
expect(semicolon);
}
if (accept(varsym)) {
do {
expect(ident);
} while (accept(comma));
expect(semicolon);
}
while (accept(procsym)) {
expect(ident);
expect(semicolon);
block();
expect(semicolon);
}
statement();
}
 
void program(void) {
nextsym();
block();
expect(period);
}
</syntaxhighlight>
 
== Examples ==
Some recursive descent parser generators:
*[[TMG (language)|TMG]] – an early compiler-compiler used in the 1960s and early 1970s
*[[JavaCC]]
*[[Coco/R]]
*[[ANTLR]]
*[[Spirit Parser Framework]] – a C++ recursive descent parser generator framework requiring no pre-compile step
*[[parboiled (Java)]] – a recursive descent [[Parsing expression grammar|PEG]] parsing library for [[Java (programming language)|Java]]
 
The C++ front-end of the [[Clang]] compiler contains a hand-written parser based on the recursive-descent parsing algorithm. <ref>How Clang handles the type / variable name ambiguity of C/C++
https://eli.thegreenplace.net/2012/07/05/how-clang-handles-the-type-variable-name-ambiguity-of-cc/</ref>
 
==See also==
*[[Parser combinator]] – a higher-order function used in combinatory parsing, a method of factoring recursive descent parser designs
*[[Parsing expression grammar]] – another form representing recursive descent grammar
*[[Recursive ascent parser]]
*[[Tail recursive parser]] – a variant of the recursive descent parser
 
==References==
{{Reflist}}
 
===General references===
* ''[[Compilers: Principles, Techniques, and Tools]]'', first edition, Alfred V Aho, Ravi Sethi, and Jeffrey D Ullman, in particular Section 4.4.
* ''Modern Compiler Implementation in Java, Second Edition'', Andrew Appel, 2002, {{ISBN|0-521-82060-X}}.
* ''Recursive Programming Techniques'', W.H. Burge, 1975, {{ISBN|0-201-14450-6}}
* ''Crafting a Compiler with C'', Charles N Fischer and Richard J LeBlanc, Jr, 1991, {{ISBN|0-8053-2166-7}}.
* ''Compiling with C# and Java'', Pat Terry, 2005, {{ISBN|0-321-26360-X}}, 624
* ''Algorithms + Data Structures = Programs'', Niklaus Wirth, 1975, {{ISBN|0-13-022418-9}}
* ''Compiler Construction'', Niklaus Wirth, 1996, {{ISBN|0-201-40353-6}}
 
==External links==
*[https://compilers.iecc.com/crenshaw/ Jack W. Crenshaw: ''Let's Build A Compiler'' (1988-1995)], in [[Pascal (programming language)|Pascal]], with [[assembly language]] output, using a "keep it simple" approach
 
{{Parsers}}
 
[[Category:Parsing algorithms]]
[[Category:Articles with example C code]]