Recursive descent parser: Difference between revisions

Content deleted Content added
m linking
 
(40 intermediate revisions by 30 users not shown)
Line 1:
{{short description|Algorithm}}
{{More footnotes|date=February 2009}}
 
In [[computer science]], a '''recursive descent parser''' is a kind of [[top-down parsing|top-down parser]] built from a set of [[mutual recursion|mutually recursive]] procedures (or a non-recursive equivalent) where each such [[procedure (computer science)|procedure]] implements one of the [[Terminal and nonterminal symbols|nonterminals]] of the [[formal grammar|grammar]]. Thus the structure of the resulting program closely mirrors that of the grammar it recognizes.<ref>{{FOLDOC|Recursive+descent+parser}}</ref><ref>{{cite book | title=Recursive Programming Techniques | author=Burge, W.H. | year=1975 | publisher=Addison-Wesley Publishing Company | isbn=0-201-14450-6 | url-access=registration | url=https://archive.org/details/recursiveprogram0000burg }}</ref>
 
A ''predictive parser'' is a recursive descent parser that does not require [[backtracking]].<ref name="Watson2017">{{cite book|last=Watson|first=Des|title=A Practical Approach to Compiler Construction|url=https://books.google.com/books?id=05B0DgAAQBAJ&q=%22predictive+parser%22|date=22 March 2017|publisher=Springer|isbn=978-3-319-52789-5}}</ref> Predictive parsing is possible only for the class of [[LL parser|LL(''k'')]] grammars, which are the [[context-free grammar]]s for which there exists some positive integer ''k'' that allows a recursive descent parser to decide which production to use by examining only the next ''k'' tokens of input. The LL(''k'') grammars therefore exclude all [[ambiguous grammarsgrammar]]s, as well as all grammars that contain [[left recursion]]. Any context-free grammar can be transformed into an equivalent grammar that has no left recursion, but removal of left recursion does not always yield an LL(''k'') grammar. A predictive parser runs in [[linear time]].
 
Recursive descent with backtracking is a technique that determines which [[Production rule (formal languages)|production]] to use by trying each production in turn. Recursive descent with backtracking is not limited to LL(''k'') grammars, but is not guaranteed to terminate unless the grammar is LL(''k''). Even when they terminate, parsers that use recursive descent with backtracking may require [[exponential time]].
 
Although predictive parsers are widely used, and are frequently chosen if writing a parser by hand, programmers often prefer to use a table-based parser produced by a [[parser generator]],{{Citation needed|date=February 2018}}, either for an LL(''k'') language or using an alternative parser, such as [[LALR parser|LALR]] or [[LR parser|LR]]. This is particularly the case if a grammar is not in [[LL parser|LL(''k'')]] form, as transforming the grammar to LL to make it suitable for predictive parsing is involved. Predictive parsers can also be automatically generated, using tools like [[ANTLR]].
 
Predictive parsers can be depicted using transition diagrams for each non-terminal symbol where the edges between the initial and the final states are labelled by the symbols (terminals and non-terminals) of the right side of the production rule.<ref>{{cite book|lastlast1=Aho|firstfirst1=Alfred V.|last2=Sethi|first2=Ravi|last3=Ullman|first3=Jeffrey|authorlink1=Alfred V. Aho|authorlink3=Jeffrey Ullman|title=Compilers: Principles, Techniques and Tools|url=https://archive.org/details/compilers00ahoa|url-access=limited|date=1986|publisher=Addison Wesley|page=[https://archive.org/details/compilers00ahoa/page/n193 183]|edition=first}}</ref>
 
==Example parser==
The following [[Extended Backus–Naur Form|EBNF]]-like [[formal grammar|grammar]] (for [[Niklaus Wirth]]'s [[PL/0]] programming language, from ''[[Algorithms + Data Structures = Programs]]'') is in [[LL parser|LL(1)]] form:
 
<sourcesyntaxhighlight lang="ebnf">
program = block "." .
block =
["const" ident "=" numnumber {"," ident "=" numnumber} ";"]
["var" ident {"," ident} ";"]
{"procedure" ident ";" block ";"} statement .
Line 41 ⟶ 42:
| number
| "(" expression ")" .
</syntaxhighlight>
</source>
 
[[terminal symbol|Terminals]] are expressed in quotes. Each [[nonterminal symbol|nonterminal]] is defined by a rule in the grammar, except for ''ident'' and ''number'', which are assumed to be implicitly defined.
Line 47 ⟶ 48:
===C implementation===
 
What follows is aan implementation of a recursive descent parser for the above language in [[C (programming language)|C]]. The parser reads in source code, and exits with an error message if the code fails to parse, exiting silently if the code parses correctly.
 
Notice how closely the predictive parser below mirrors the grammar above. There is a procedure for each nonterminal in the grammar. Parsing descends in a top-down manner, until the final nonterminal has been processed. The program fragment depends on a global variable, ''sym'', which contains the current symbol from the input, and the function ''nextsym'', which updates ''sym'' when called.
 
The implementations of the functions ''nextsym'' and ''error'' are omitted for simplicity.
 
<sourcesyntaxhighlight lang="c">
typedef enum {ident, number, lparen, rparen, times, slash, plus,
minus, eql, neq, lss, leq, gtr, geq, callsym, beginsym, semicolon,
Line 179 ⟶ 180:
expect(period);
}
</syntaxhighlight>
</source>
 
== Examples ==
===Pascal implementation===
Some recursive descent parser generators:
*[[TMG (language)|TMG]] – an early compiler-compiler used in the 1960s and early 1970s
*[[JavaCC]]
*[[Coco/R]]
*[[ANTLR]]
*[[Spirit Parser Framework]] – a C++ recursive descent parser generator framework requiring no pre-compile step
*[[parboiled (Java)]] – a recursive descent [[Parsing expression grammar|PEG]] parsing library for [[Java (programming language)|Java]]
 
The C++ front-end of the [[Clang]] compiler contains a hand-written parser based on the recursive-descent parsing algorithm. <ref>How Clang handles the type / variable name ambiguity of C/C++
What follows is an implementation of a recursive descent parser for the above language in [[Pascal (programming language)|ModernPascal]]. The parser reads in source code, and exits with an error message if the code fails to parse, exiting silently if the code parses correctly.
https://eli.thegreenplace.net/2012/07/05/how-clang-handles-the-type-variable-name-ambiguity-of-cc/</ref>
 
Notice how closely the predictive parser below mirrors the grammar above. There is a procedure for each nonterminal in the grammar. Parsing descends in a top-down manner, until the final nonterminal has been processed. The program fragment depends on a global variable, ''sym'', which contains the current symbol from the input, and the function ''nextsym'', which updates ''sym'' when called.
 
The implementations of the functions ''nextsym'' and ''error'' are omitted for simplicity.
 
<source lang="pascal">
// Modern Pascal implementation (www.ModernPascal.com)
// C example ported to Pascal by Ozz Nixon
 
{$S-}
 
type
SYMBOL = (ident, number, lparen, rparen, times, slash, plus,
minus, eql, neq, lss, leq, gtr, geq, callsym, beginsym, semicolon,
endsym, ifsym, whilesym, becomes, thensym, dosym, constsym, comma,
varsym, procsym, period, oddsym);
 
var
sym:SYMBOL;
 
procedure expression; forward;
 
procedure nextsym;
begin
// you implement //
end;
 
procedure error(const msg:String);
begin
// you implement //
end;
 
function accept(S:Symbol):boolean;
begin
if sym=s then begin
nextsym();
Result:=True;
end
else Result:=False;
end;
 
function expect(S:Symbol):boolean;
begin
Result:=accept(s);
if not result then error('Expect: unexpected symbol');
end;
 
procedure factor;
begin
if (accept(ident)) then begin
//
end
else if (accept(number)) then begin
//
end
else if (accept(lparen)) then begin
expression();
expect(rparen);
end
else begin
error('factor: syntax error');
nextsym();
end;
end;
 
procedure term;
begin
factor();
while (sym=times) or (sym=slash) do begin
nextsym();
factor();
end;
end;
 
procedure expression;
begin
if (sym=plus) or (sym=minus) then nextsym();
term();
while (sym=plus) or (sym=minus) do begin
nextsym();
term();
end;
end;
 
procedure condition;
begin
if (accept(oddsym)) then begin
expression();
end
else begin
expression();
if (sym=eql) or (sym=neq) or (sym=lss) or (sym=leq) or (sym=gtr) or (sym=geq) then begin
nextsym();
expression();
end
else begin
error('condition: invalid operator');
nextsym();
end;
end;
end;
 
procedure statement;
begin
if (accept(ident)) then begin
expect(becomes);
expression();
end
else if (accept(callsym)) then begin
expect(ident);
end
else if (accept(beginsym)) then begin
repeat
statement();
until not (accept(semicolon));
expect(endsym);
end
else if (accept(ifsym)) then begin
condition();
expect(thensym);
statement();
end
else if (accept(whilesym)) then begin
condition();
expect(dosym);
statement();
end
else begin
error('statement: syntax error');
nextsym();
end;
end;
 
procedure block;
begin
if (accept(constsym)) then begin
repeat
expect(ident);
expect(eql);
expect(number);
until not (accept(comma));
expect(semicolon);
end;
if (accept(varsym)) then begin
repeat
expect(ident);
until not (accept(comma));
expect(semicolon);
end;
while (accept(procsym)) do begin
expect(ident);
expect(semicolon);
block();
expect(semicolon);
end;
statement();
end;
 
begin
nextsym();
block();
expect(period);
end;
</source>
 
==See also==
*[[Parser combinator]] – a higher-order function used in combinatory parsing, a method of factoring recursive descent parser designs
{{Portal|Computer science}}
*[[JavaCC]] – a recursive descent parser generator
*[[Coco/R]] – a recursive descent parser generator
*[[ANTLR]] – a recursive descent parser generator
*[[Parsing expression grammar]] – another form representing recursive descent grammar
*[[Recursive ascent parser]]
*[[Spirit Parser Framework]] – a C++ recursive descent parser generator framework requiring no pre-compile step
*[[Tail recursive parser]] – a variant of the recursive descent parser
*[[parboiled (Java)]] – a recursive descent [[Parsing_expression_grammar|PEG]] parsing library for [[Java (programming language)|Java]]
*[[Recursive ascent parser]]
* [http://sourceforge.net/projects/bnf2xml/ bnf2xml] Markup input with XML tags using advanced BNF matching. (a top town LL recursive parser, front to back text, no compiling of lexor is needed or used)
* [https://metacpan.org/module/Parse::RecDescent Parse::RecDescent]: A versatile recursive descent [[Perl]] module.
* [http://pyparsing.wikispaces.com/ pyparsing]: A versatile [[Python (programming language)|Python]] recursive parsing module that is not recursive descent ([https://mail.python.org/pipermail/python-list/2007-November/416678.html python-list post]).
* [http://jparsec.codehaus.org/ Jparsec] a Java port of the Haskell [[Parsec (parser)|Parsec]] module.
 
==References==
{{Reflist}}
 
===General references===
{{reflist}}
{{FOLDOC}}
* ''[[Compilers: Principles, Techniques, and Tools]]'', first edition, Alfred V Aho, Ravi Sethi, and Jeffrey D Ullman, in particular Section 4.4.
* ''Modern Compiler Implementation in Java, Second Edition'', Andrew Appel, 2002, {{ISBN|0-521-82060-X}}.
Line 377 ⟶ 213:
 
==External links==
*[https://compilers.iecc.com/crenshaw/ Jack W. Crenshaw: ''Let's Build A Compiler'' (1988-1995)], in [[Pascal (programming language)|Pascal]], with [[assembly language]] output, using a "keep it simple" approach
*[http://www.mollypages.org/page/grammar/index.mp Introduction to Parsing] - an easy to read introduction to parsing, with a comprehensive section on recursive descent parsing
*[http://teaching.idallen.com/cst8152/98w/recursive_decent_parsing.html How to turn a Grammar into C code] - a brief tutorial on implementing recursive descent parser
*[http://lukaszwrobel.pl/blog/math-parser-part-3-implementation Simple mathematical expressions parser] in [[Ruby (programming language)|Ruby]]
*[http://effbot.org/zone/simple-top-down-parsing.htm Simple Top Down Parsing in Python]
*[http://compilers.iecc.com/crenshaw/ Jack W. Crenshaw: ''Let's Build A Compiler'' (1988-1995)], in [[Pascal (programming language)|Pascal]], with [[assembly language]] output, using a "keep it simple" approach
*[http://www.cs.nott.ac.uk/~gmh/pearl.pdf Functional Pearls: Monadic Parsing in Haskell]
 
{{Parsers}}