Recursive descent parser: Difference between revisions

Content deleted Content added
Chobot (talk | contribs)
m robot Adding: ko
More extensive example, additional references
Line 6:
 
== Example parser ==
ConsiderGiven the following [[formalEBNF]] grammar|[[grammar]] in(for [[Backus-NaurNiklaus form|BNFWirth]]:'s [[PL/0]], from
[[Algorithms + Data Structures = Programs]]):
 
<pre>
: <E> ::= NOT <E> | ( <E> <F> ) | TRUE | FALSE
program = block "." .
: <F> ::= AND <E> | OR <E>
 
block =
We define a procedure <tt>parse_''A''()</tt> for every nonterminal ''A'' that parses a string in the language of ''A''. In this procedure it is first determined, with the current symbol in the input stream, which rule for the nonterminal will be used. Then it simply calls the procedures <tt>read_ch(''a'')</tt> and <tt>parse_''A''()</tt> for every terminal ''a'' and nonterminal ''A'' in the right-hand side for the rule.
[ "const" ident "=" number {"," ident "=" number} ";"]
[ "var" ident {"," ident} ";"]
{ "procedure" ident ";" block ";" } statement .
 
statement =
'''procedure''' parse_E() {
[ ident ":=" expression
'''if''' ''ch'' = 'N' { read_ch('N') read_ch('O') read_ch('T') parse_E() }
| "call" ident
'''else if''' ''ch'' = '(' { read_ch('(') parse_E() parse_F() read_ch(')') }
| "begin" statement {";" statement } "end"
'''else if''' ''ch'' = 'T' { read_ch('T') read_ch('R') read_ch('U') read_ch('E') }
| "if" condition "then" statement
'''else if''' ''ch'' = 'F' { read_ch('F') read_ch('A') read_ch('L') read_ch('S') read_ch('E') }
| "while" condition "do" statement
].
 
condition =
"odd" expression
| expression ("="|"#"|"<"|"<="|">"|">=") expression
.
 
expression = [ "+"|"-"] term { ("+"|"-") term} .
 
term = factor {("*"|"/") factor} .
 
factor = ident | number | "(" expression ")" .
</pre>
 
[[Terminals]] are expressed in quotes (except for the well defined 'ident'
and 'number'). Each [[nonterminal]] is defined by a rule in the grammar.
 
Notice how closely the code below mirrors the grammar above. There is a procedure for each nonterminal in the grammar. Parsing descends in a top-down manner, until the final nonterminal has been processed. The program fragment depends on a global variable, ''sym'', which contains the next symbol from the input, and the global function ''getsym'', which updates ''sym'' when called.
 
<pre>
int accept(Symbol s) {
if (sym == s) {
getsym();
return 1;
}
return 0;
}
 
int expect(Symbol s) {
'''procedure''' parse_F() {
if (accept(s))
'''if''' ''ch'' = 'A' { read_ch('A') read_ch('N') read_ch('D') parse_E() }
return 1;
'''else if''' ''ch'' = 'O' { read_ch('O') read_ch('R') parse_E() }
error("expect: unexpected symbol");
return 0;
}
 
void factor(void) {
if (accept(ident)) {
;
} else if (accept(number)) {
;
} else if (accept(lparen)) {
expression();
expect(rparen);
} else {
error("factor: syntax error");
getsym();
}
}
 
void term(void) {
factor();
while (sym == times || sym == slash) {
getsym();
factor();
}
}
 
void expression(void) {
if (sym == plus || sym == minus)
getsym();
term();
while (sym == plus || sym == minus) {
getsym();
term();
}
}
 
void condition(void) {
if (accept(oddsym)) {
expression();
} else {
expression();
if (sym == eql || sym == neq || sym == lss ||
sym == leq || sym == gtr || sym == geq) {
getsym();
expression();
} else {
error("condition: invalid operator");
getsym();
}
}
}
 
void statement(void) {
if (accept(ident)) {
expect(becomes);
expression();
} else if (accept(callsym)) {
expect(ident);
} else if (accept(beginsym)) {
do {
statement();
} while (accept(semicolon));
expect(endsym);
} else if (accept(ifsym)) {
condition();
expect(thensym);
statement();
} else if (accept(whilesym)) {
condition();
expect(dosym);
statement();
}
}
 
void block(void) {
if (accept(constsym)) {
do {
expect(ident);
expect(eql);
expect(number);
} while (accept(comma));
expect(semicolon);
}
if (accept(varsym)) {
do {
expect(ident);
} while (accept(comma));
expect(semicolon);
}
while (accept(procsym)) {
expect(ident);
expect(semicolon);
block();
expect(semicolon);
}
 
statement();
}
 
void program(void) {
getsym();
block();
expect(period);
}
</pre>
 
These procedures use a global variable ''ch'' that contains the current first character in the input stream. The procedure <tt>read_ch(''a'')</tt> reports an error if ''ch'' is not equal to ''a'' and otherwise reads the next character from the input stream into ''ch''.
 
To determine which rule should be applied in case of a certain terminal the same algorithm can be used as the one for constructing the parsing table of an [[LL parser]].
 
== Formalizing recursive descent parsers ==
Line 34 ⟶ 167:
== Implementation in functional languages ==
Recursive descent parsers are particularly easy to implement in functional languages such as [[Haskell programming language|Haskell]]. See [http://www.cs.nott.ac.uk/~gmh/pearl.pdf Functional Pearls: Monadic Parsing in Haskell] (pdf format).
 
== See Also ==
*[[PL/0]] - for more about the programming language PL/0
*[[COCO/R]] - a recursive descent parser generator
*[[ANTLR]] - a recursive descent parser generator
 
== References ==
Line 41 ⟶ 179:
* [http://pyparsing.sourceforge.net/ pyparsing]: A versatile recursive descent [[Python (programming language)|Python]] module.
{{FOLDOC}}
* ''Algorithms + Data Structures = Programs'', Niklaus Wirth, 1975, ISBN 0-13-022418-9
* ''Compiler Construction'', Niklaus Wirth, 1996, ISBN 0-201-40353-6
* ''Compiling with C# and Java'', Pat Terry, 2005, ISBN/ISSN: 0-321-26360-X 624
 
[[Category:Parsing algorithms]]