Parsing expression grammar: Difference between revisions

Content deleted Content added
Theory of parsing expression grammars: Undecidability of language emptiness.
Line 371:
 
The class of parsing expression languages is closed under set intersection and complement, thus also under set union.<ref name="For04" />{{rp|Sec.3.4}}
 
=== Undecidability of emptiness ===
In stark contrast to the case for context-free grammars, it is not possible to generate elements of a parsing expression language from its grammar. Indeed, it is algorithmically [[undecidable problem|undecidable]] whether the language recognised by a parsing expression grammar is empty! One reason for this is that any instance of the [[Post correspondence problem]] reduces to an instance of the problem of deciding whether a parsing expression language is empty.
 
Recall that an instance of the Post correspondence problem consists of a list <math> (\alpha_1,\beta_1), (\alpha_2,\beta_2), \dotsc, (\alpha_n,\beta_n) </math> of pairs of strings (of terminal symbols). The problem is to determine whether there exists a sequence <math>\{k_i\}_{i=1}^m</math> of indices in the range <math>\{1,\dotsc,n\}</math> such that <math> \alpha_{k_1} \alpha_{k_2} \dotsb \alpha_{k_m} = \beta_{k_1} \beta_{k_2} \dotsb \beta_{k_m} </math>. To [[reduction (complexity)|reduce]] this to a parsing expression grammar, let <math> \gamma_0, \gamma_1, \dotsc, \gamma_n </math> be arbitrary pairwise distinct equally long strings of terminal symbols (already with <math>2</math> distinct symbols in the terminal symbol alphabet, length <math>\lceil \log_2(n+1) \rceil</math> suffices) and consider the parsing expression grammar
<math display="block">
\begin{aligned}
S &\leftarrow \&(A \, !.) \&(B \, !.) (\gamma_1/\dotsb/\gamma_n)^+ \gamma_0 \\
A &\leftarrow \gamma_0 / \gamma_1 A \alpha_1 / \dotsb / \gamma_n A \alpha_n \\
B &\leftarrow \gamma_0 / \gamma_1 B \beta_1 / \dotsb / \gamma_n B \beta_n
\end{aligned}
</math>
Any string matched by the nonterminal <math>A</math> has the form <math>\gamma_{k_m} \dotsb \gamma_{k_2} \gamma_{k_1} \gamma_0 \alpha_{k_1} \alpha_{k_2} \dotsb \alpha_{k_m}</math> for some indices <math>k_1,k_2,\dotsc,k_m</math>. Likewise any string matched by the nonterminal <math>B</math> has the form <math>\gamma_{k_m} \dotsb \gamma_{k_2} \gamma_{k_1} \gamma_0 \beta_{k_1} \beta_{k_2} \dotsb \beta_{k_m}</math>. Thus any string matched by <math>S</math> will have the form <math>\gamma_{k_m} \dotsb \gamma_{k_2} \gamma_{k_1} \gamma_0 \rho</math> where <math> \rho = \alpha_{k_1} \alpha_{k_2} \dotsb \alpha_{k_m} = \beta_{k_1} \beta_{k_2} \dotsb \beta_{k_m}</math>.
 
== Practical use ==