CYK algorithm: Difference between revisions

Content deleted Content added
Rephrasing introduction
Rephrasing
Line 1:
The '''Cocke-Younger-Kasami (CYK) algorithm''' (alternatively called CKY) determines whether a
[[string (computer science)|string]] can be generated by a given [[context-free grammar]] and, if so, how it can be generated. This is known as [[parsing]] the string. The algorithm isemploys an[[bottom-up exampleparsing]] ofand [[dynamic programming]].
 
The standard version of CYK recognizesoperates languages defined byon context-free grammars writtengiven in [[Chomsky normal form]] (CNF). Since anyAny context-free grammar canmay be converted to CNF without too much difficulty, CYK can be used to recognize any context-freewritten languagethusly.
 
The importance of the CYK algorithm stems formfrom the fact that it constructively proves that [[membership problem]] for context-free languages is [[decision problem|decidable]] (see also: [[theory of computation]]) and the fact that it does so quite efficiently.
The worst case [[asymptotic]] time complexity of CYK is [[Big O notation|Θ]](n<sup>3</sup>), where ''n'' is the length of the parsed string. This makes it one of the most efficient (in those terms) algorithms for recognizing general context-free languages. Specialized and faster algorithms exist for certain subsets of the context-free languages.
 
==Standard form==
The algorithm foras thegiven membershipin problem[[pseudocode]] is as follows:
The CYK algorithm is important to the [[theory of computation]], since it can be used to constructively prove that the [[membership problem]] for context-free languages is [[decision problem|decidable]].
===As pseudocode===
The algorithm employs [[bottom-up parsing]].
 
The algorithm for the membership problem is as follows:
'''Let''' the input string consist of ''n'' letters, ''a''<sub>1</sub> ... ''a''<sub>''n''</sub>.
'''Let''' the grammar contain ''r'' terminal and nonterminal symbols ''R''<sub>1</sub> ... ''R''<sub>''r''</sub>.
Line 27 ⟶ 25:
'''Else''' string is not member of language
 
===InformallyAs prose===
 
In informal terms, this algorithm considers every possible subsequence of the sequence of words and sets P[i,j,k] to be true if the subsequence of words starting from i of length j can be generated from R<sub>k</sub>. Once it has considered subsequences of length 1, it goes on to subsequences of length 2, and so on. For subsequences of length 2 and greater, it considers every possible partition of the subsequence into two parts, and checks to see if there is some production P → Q R such that Q matches the first part and R matches the second part. If so, it records P as matching the whole subsequence. Once this process is completed, the sentence is recognized by the grammar if the subsequence containing the entire sentence is matched by the start symbol.