Revision as of 03:20, 1 September 2006 edit David Chiang (talk \| contribs) 74 edits The deleted statement doesn't contribute anything unless elaborated, IMO ← Previous edit		Revision as of 09:01, 20 September 2006 edit undo Bluebot (talk \| contribs) 349,597 edits Unicodifying Next edit →
Line 4: The standard version of CYK recognizes languages defined by context-free grammars written in [[Chomsky normal form]] (CNF). Since any context-free grammar can be converted to CNF without too much difficulty, CYK can be used to recognize any context-free language. It is also possible to extend the CYK algorithm to handle some context-free grammars which are not written in CNF; this may be done to improve performance, although at the cost of making the algorithm harder to understand. The worst case [[asymptotic]] time complexity of CYK is [[Big O notation\|~~Θ~~Θ]](n<sup>3</sup>), where ''n'' is the length of the parsed string. This makes it one of the most efficient (in those terms) algorithms for recognizing any context-free language. However, there are other algorithms that will perform better for certain subsets of the context-free languages. ==The algorithm== Line 16: '''Let''' P[n,n,r] be an array of booleans. Initialize all elements of P to false. '''For each''' i = 1 to n '''For each''' unit production R<sub>j</sub> ~~→~~→ a<sub>i</sub>, '''set''' P[i,1,j] = true. '''For each''' i = 2 to n -- Length of span '''For each''' j = 1 to n-i+1 -- Start of span Line 28: ===Informally=== In informal terms, this algorithm considers every possible substring of the string of letters and sets P[i,j,k] to be true if the substring of letters starting from i of length j can be generated from R<sub>k</sub>. Once it has considered substrings of length 1, it goes on to substrings of length 2, and so on. For substrings of length 2 and greater, it considers every possible partition of the substring into two halves, and checks to see if there is some production P ~~→~~→ Q R such that Q matches the first half and R matches the second half. If so, it records P as matching the whole substring. Once this process is completed, the sentence is recognized by the grammar if the substring containing the entire string is matched by the start symbol. {\| class="wikitable" Line 61: * T. Kasami (1965). An efficient recognition and syntax-analysis algorithm for context-free languages. Scientific report AFCRL-65-758, Air Force Cambridge Research Lab, Bedford, MA. * Daniel H. Younger (1967). Recognition and parsing of context-free languages in time ''n''<sup>3</sup>. ''Information and Control'' 10(2): 189–208. [[Category:Parsing algorithms]]

CYK algorithm: Difference between revisions