CYK algorithm: Difference between revisions

Content deleted Content added
The algorithm: Rephrasing
Renamed headers; "The Algorithm" -> "Standard form"; "Algorithm extensions" -> "Extensions"
Line 6:
The worst case [[asymptotic]] time complexity of CYK is [[Big O notation|Θ]](n<sup>3</sup>), where ''n'' is the length of the parsed string. This makes it one of the most efficient (in those terms) algorithms for recognizing any context-free language. However, there are other algorithms that will perform better for certain subsets of the context-free languages.
 
==TheStandard algorithmform==
The CYK algorithm is important to the [[theory of computation]], since it can be used to constructively prove that the [[membership problem]] for context-free languages is [[decision problem|decidable]].
The algorithm employs [[bottom-up parsing]].
Line 50:
|}
 
==Extensions==
===Algorithm extensions===
====Generating a parse tree====
It is simple to extend the above algorithm to not only determine if a sentence is in a language, but to also construct a [[parse tree]], by storing parse tree nodes as elements of the array, instead of booleans. Since the grammars being recognized can be ambiguous, it is necessary to store a list of nodes (unless one wishes to only pick one possible parse tree); the end result is then a forest of possible parse trees.
An alternative formulation employs a second table B[n,n,r] of so-called ''backpointers''.
 
====Parsing non-CNF context-free grammars====
It is also possible to extend the CYK algorithm to handle some context-free grammars which are not written in CNF; this may be done to improve performance, although at the cost of making the algorithm harder to understand.
 
====Parsing weighted context-free grammars====
It is also possible to extend the CYK algorithm to parse strings using [[weighted context-free grammar|weighted]] and [[stochastic context-free grammar]]s. Weights (probabilities) are then stored in the table P instead of booleans, so P[i,j,A] will contain the minimum weight (maximum probability) that the substring from i to j can be derived from A. Further extensions of the algorithm allow all parses of a string to be enumerated from lowest to highest weight (highest to lowest probability).