Parsing expression grammar: Difference between revisions

Content deleted Content added
Definition: Terminals and nonterminals. Subsections for atomic and composite parsing expressions, as well as grammars.
Grammars: Concrete syntax
Line 82:
 
==== Grammars ====
In the concrete syntax, a parsing expression grammar is simply a sequence of nonterminal definitions, each of which has the form
As a mathematical object, a parsing expression grammar is a tuple <math>(N,\Sigma,P,e_S)</math>, where <math>N</math> is the set of nonterminal symbols, <math>\Sigma</math> is the set of terminal symbols, <math>P</math> is a [[Function (mathematics)|function]] from <math>N</math> to the set of parsing expressions on <math>N \cup \Sigma</math>, and <math>e_S</math> is the starting parsing expression.
<syntaxhighlight lang="peg">
Identifier LEFTARROW Expression
</syntaxhighlight>
The <code>Identifier</code> is the nonterminal being defined, and the <code>Expression</code> is the parsing expression it is defined as referencing. The <code>LEFTARROW</code> varies a bit between dialects, but is generally some left-pointing arrow or assignment symbol, such as <code><-</code>, <code>←</code>, <code>:=</code>, or <code>=</code>. One way to understand it is precisely as making an assignment or definition of the nonterminal. Another way to understand it is as a contrast to the right-pointing arrow → used in the rules of a [[context-free grammar]]; with parsing expressions the flow of information goes from expression to nonterminal, not nonterminal to expression.
 
As a mathematical object, a parsing expression grammar is a tuple <math>(N,\Sigma,P,e_S)</math>, where <math>N</math> is the set of nonterminal symbols, <math>\Sigma</math> is the set of terminal symbols, <math>P</math> is a [[Function (mathematics)|function]] from <math>N</math> to the set of parsing expressions on <math>N \cup \Sigma</math>, and <math>e_S</math> is the starting parsing expression. Some concrete syntax dialects give the starting expression explicitly,<ref name="ptKupries">{{cite web |last1=Kupries |first1=Andreas |title=pt::peg_language - PEG Language Tutorial |url=https://core.tcl-lang.org/tcllib/doc/tcllib-1-21/embedded/md/tcllib/files/modules/pt/pt_peg_language.md |website=Tcl Library Source Code |access-date=14 January 2024}}</ref> but the primary concrete syntax instead has the implicit rule that the first nonterminal defined is the starting expression.
<!--- old text:
Formally, a parsing expression grammar consists of:
* A finite set ''N'' of ''[[nonterminal symbol]]s''.
* A finite set Σ of ''[[terminal symbol]]s'' that is [[disjoint sets|disjoint]] from ''N''.
* A finite set ''P'' of ''parsing rules''.
* An expression ''e<sub>S</sub>'' termed the ''starting expression''.
 
It is worth noticing that the primary dialect of concrete syntax parsing expression grammars does not have an explicit definition terminator or separator between definitions, although it is customary to begin a new definition on a new line; the <code>LEFTARROW</code> of the next definition is sufficient for finding the boundary, if one adds the constraint that a nonterminal in an <code>Expression</code> must not be followed by a <code>LEFTARROW</code>. However, some dialects may allow an explicit terminator, or outright require<ref name="ptKupries"/> it.
Each parsing rule in ''P'' has the form ''A'' ← ''e'', where ''A'' is a nonterminal symbol and ''e'' is a ''parsing expression''.
--->
 
=== Semantics ===