CYK algorithm: Difference between revisions

Content deleted Content added
The deleted statement doesn't contribute anything unless elaborated, IMO
Bluebot (talk | contribs)
Unicodifying
Line 4:
The standard version of CYK recognizes languages defined by context-free grammars written in [[Chomsky normal form]] (CNF). Since any context-free grammar can be converted to CNF without too much difficulty, CYK can be used to recognize any context-free language. It is also possible to extend the CYK algorithm to handle some context-free grammars which are not written in CNF; this may be done to improve performance, although at the cost of making the algorithm harder to understand.
 
The worst case [[asymptotic]] time complexity of CYK is [[Big O notation|&Theta;Θ]](n<sup>3</sup>), where ''n'' is the length of the parsed string. This makes it one of the most efficient (in those terms) algorithms for recognizing any context-free language. However, there are other algorithms that will perform better for certain subsets of the context-free languages.
 
==The algorithm==
Line 16:
'''Let''' P[n,n,r] be an array of booleans. Initialize all elements of P to false.
'''For each''' i = 1 to n
'''For each''' unit production R<sub>j</sub> &rarr; a<sub>i</sub>, '''set''' P[i,1,j] = true.
'''For each''' i = 2 to n -- Length of span
'''For each''' j = 1 to n-i+1 -- Start of span
Line 28:
===Informally===
 
In informal terms, this algorithm considers every possible substring of the string of letters and sets P[i,j,k] to be true if the substring of letters starting from i of length j can be generated from R<sub>k</sub>. Once it has considered substrings of length 1, it goes on to substrings of length 2, and so on. For substrings of length 2 and greater, it considers every possible partition of the substring into two halves, and checks to see if there is some production P &rarr; Q R such that Q matches the first half and R matches the second half. If so, it records P as matching the whole substring. Once this process is completed, the sentence is recognized by the grammar if the substring containing the entire string is matched by the start symbol.
 
{| class="wikitable"
Line 61:
* T. Kasami (1965). An efficient recognition and syntax-analysis algorithm for context-free languages. Scientific report AFCRL-65-758, Air Force Cambridge Research Lab, Bedford, MA.
* Daniel H. Younger (1967). Recognition and parsing of context-free languages in time ''n''<sup>3</sup>. ''Information and Control'' 10(2): 189&ndash;208.
 
[[Category:Parsing algorithms]]