Recursive descent parser

This is an old revision of this page, as edited by WilliamDClinger (talk | contribs) at 17:12, 7 March 2006 (References). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

A recursive descent parser is a top-down parser built from a set of mutually-recursive procedures (or a non-recursive equivalent) where each such procedure usually implements one of the production rules of the grammar. Thus the structure of the resulting program closely mirrors that of the grammar it recognises.

Recursive descent parsers recognize the class of LL(k) grammars for unbounded k. In less formal terms, they recognize all context-free grammars except those that are left recursive. The running time of a recursive descent parser in the general case is exponential, although there is a modification to the algorithm called a packrat parser, which trades memory for linear time. For a large class of grammars, it is also possible to eliminate left recursion and backtracking.

Recursive descent parsers are used less often in practice than LR or LALR parsers which are easier to write using parser generators. However, recursive descent parsers have the advantage that they can pass information down to subrules, allowing for parameterized nonterminals. This allows these parsers to match a certain class of non-context-free languages.

Example parser

Given the following EBNF grammar (for Niklaus Wirth's PL/0 programming language, from Algorithms + Data Structures = Programs):

  program = block "." .

  block =
    [ "const" ident "=" number {"," ident "=" number} ";"]
      [ "var" ident {"," ident} ";"]
      { "procedure" ident ";" block ";" } statement .

  statement =
    [ ident ":=" expression
    | "call" ident
    | "begin" statement {";" statement } "end"
    | "if" condition "then" statement
    | "while" condition "do" statement
    ].

  condition =
    "odd" expression
    | expression ("="|"#"|"<"|"<="|">"|">=") expression
    .

  expression = [ "+"|"-"] term { ("+"|"-") term} .

  term = factor {("*"|"/") factor} .

  factor = ident | number | "(" expression ")" .

Terminals are expressed in quotes (except for the well defined ident and number). Each nonterminal is defined by a rule in the grammar.

Notice how closely the code below mirrors the grammar above. There is a procedure for each nonterminal in the grammar. Parsing descends in a top-down manner, until the final nonterminal has been processed. The program fragment depends on a global variable, sym, which contains the next symbol from the input, and the global function getsym, which updates sym when called.

int accept(Symbol s) {
  if (sym == s) {
    getsym();
    return 1;
  }
  return 0;
}

int expect(Symbol s) {
  if (accept(s))
    return 1;
  error("expect: unexpected symbol");
  return 0;
}

void factor(void) {
  if (accept(ident)) {
    ;
  } else if (accept(number)) {
    ;
  } else if (accept(lparen)) {
    expression();
    expect(rparen);
  } else {
    error("factor: syntax error");
    getsym();
  }
}

void term(void) {
  factor();
  while (sym == times || sym == slash) {
    getsym();
    factor();
  }
}

void expression(void) {
  if (sym == plus || sym == minus)
    getsym();
  term();
  while (sym == plus || sym == minus) {
    getsym();
    term();
  }
}

void condition(void) {
  if (accept(oddsym)) {
    expression();
  } else {
    expression();
    if (sym == eql || sym == neq || sym == lss ||
        sym == leq || sym == gtr || sym == geq) {
      getsym();
      expression();
    } else {
      error("condition: invalid operator");
      getsym();
    }
  }
}

void statement(void) {
  if (accept(ident)) {
    expect(becomes);
    expression();
  } else if (accept(callsym)) {
    expect(ident);
  } else if (accept(beginsym)) {
    do {
      statement();
    } while (accept(semicolon));
    expect(endsym);
  } else if (accept(ifsym)) {
    condition();
    expect(thensym);
    statement();
  } else if (accept(whilesym)) {
    condition();
    expect(dosym);
    statement();
  }
}

void block(void) {
  if (accept(constsym)) {
    do {
      expect(ident);
      expect(eql);
      expect(number);
    } while (accept(comma));
    expect(semicolon);
  }
  if (accept(varsym)) {
    do {
      expect(ident);
    } while (accept(comma));
    expect(semicolon);
  }
  while (accept(procsym)) {
    expect(ident);
    expect(semicolon);
    block();
    expect(semicolon);
  }

  statement();
}

void program(void) {
  getsym();
  block();
  expect(period);
}


Formalizing recursive descent parsers

Although context-free grammars are commonly used to formalize the syntax of the language recognized by a recursive descent parser (as in the example above), an alternate and more direct way to formalize recursive descent parsing is via parsing expression grammars, which model the structure and behavior of typical recursive descent parsers directly.

Implementation in functional languages

Recursive descent parsers are particularly easy to implement in functional languages such as Haskell. See Functional Pearls: Monadic Parsing in Haskell (pdf format).

See Also

  • Coco/R - a recursive descent parser generator
  • ANTLR - a recursive descent parser generator

References

  • The Dragon book, Alfred V Aho, Ravi Sethi, and Jeffrey D Ullman, in particular Section 4.4.
  • Modern Compiler Implementation in Java, Second Edition, Andrew Appel, 2002, ISBN 0-521-82060-X.
  • Recursive Programming Techniques, W.H. Burge, 1975, ISBN 0-201-14450-6
  • Crafting a Compiler with C, Charles N Fischer and Richard J LeBlanc, Jr, 1991, ISBN 0-8053-2166-7.
  • Parse::RecDescent: A versatile recursive descent Perl module.
  • pyparsing: A versatile recursive descent Python module.
  • Compiling with C# and Java, Pat Terry, 2005, ISBN/ISSN: 0-321-26360-X 624

This article is based on material taken from the Free On-line Dictionary of Computing prior to 1 November 2008 and incorporated under the "relicensing" terms of the GFDL, version 1.3 or later.

  • Algorithms + Data Structures = Programs, Niklaus Wirth, 1975, ISBN 0-13-022418-9
  • Compiler Construction, Niklaus Wirth, 1996, ISBN 0-201-40353-6