CYK algorithm: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 18:12, 4 April 2022 edit Rodw (talk \| contribs) Autopatrolled, Event coordinators, Extended confirmed users, New page reviewers, Pending changes reviewers, Rollbackers 840,668 edits m Disambiguating links to John Cocke (disambiguation) (link changed to John Cocke (computer scientist)) using DisamAssist. ← Previous edit		Latest revision as of 03:56, 17 July 2025 edit undo Citation bot (talk \| contribs) Bots 5,861,757 edits Removed URL that duplicated identifier. \| Use this bot. Report bugs. \| #UCB_CommandLine
(22 intermediate revisions by 15 users not shown)
Line 1: {{Short description\|Parsing algorithm for context-free grammars}} In [[computer science]], the '''Cocke–Younger–Kasami algorithm''' (alternatively called '''CYK''', or '''CKY''') is a [[parsing]] [[algorithm]] for [[context-free grammar]]s published by Itiroo Sakai in 1961.<ref>{{cite book \|last1=Grune \|first1=Dick \|title=Parsing techniques : a practical guide \|date=2008 \|publisher=Springer \|___location=New York \|page=579 \|isbn=978-0-387-20248-8 \|edition=2nd}}</ref> The algorithm is named after some of its rediscoverers: [[John Cocke (computer scientist)\|John Cocke]], Daniel Younger, [[Tadao Kasami]], and [[Jacob T. Schwartz]]. It employs [[bottom-up parsing]] and [[dynamic programming]].▼ {{Redirect\|CYK\|\|Cyk (disambiguation)}} {{Infobox algorithm \|name=Cocke–Younger–Kasami algorithm (CYK) \|class=[[Parsing]] with [[context-free grammar]]s \|data=[[String (computer science)\|String]] \|time=<math>\mathcal{O}\left( n^3 \cdot \left\| G \right\| \right)</math>, where: * <math>n</math> is length of the string * <math>\|G\|</math> is the size of the CNF grammar }} ▲In [[computer science]], the '''Cocke–Younger–Kasami algorithm''' (alternatively called '''CYK''', or '''CKY''') is a [[parsing]] [[algorithm]] for [[context-free grammar]]s published by Itiroo Sakai in 1961.<ref>{{cite book \|last1=Grune \|first1=Dick \|title=Parsing techniques : a practical guide \|date=2008 \|publisher=Springer \|___location=New York \|page=579 \|isbn=978-0-387-20248-8 \|edition=2nd}}</ref><ref>Itiroo Sakai, “Syntax in universal translation”. In Proceedings 1961 International Conference on Machine Translation of Languages and Applied Language Analysis, Her Majesty’s Stationery Office, London, p. 593-608, 1962.</ref> The algorithm is named after some of its rediscoverers: [[John Cocke (computer scientist)\|John Cocke]], Daniel Younger, [[Tadao Kasami]], and [[Jacob T. Schwartz]]. It employs [[bottom-up parsing]] and [[dynamic programming]]. The standard version of CYK operates only on context-free grammars given in [[Chomsky normal form]] (CNF). However any context-free grammar may be transformed (after convention) to a CNF grammar expressing the same language {{harv\|Sipser\|1997}}.▼ ▲The standard version of CYK operates only on context-free grammars given in [[Chomsky normal form]] (CNF). However any context-free grammar may be algorithmically transformed ~~(after convention) to~~into a CNF grammar expressing the same language {{harv\|Sipser\|1997}}. The importance of the CYK algorithm stems from its high efficiency in certain situations. Using [[Big O notation]], the [[Analysis of algorithms\|worst case running time]] of CYK is <math>\mathcal{O}\left( n^3 \cdot \left\| G \right\| \right)</math>, where <math>n</math> is the length of the parsed string and <math>\left\| G \right\|</math> is the size of the CNF grammar <math>G</math> {{harv\|Hopcroft\|Ullman\|1979\|p=140}}. This makes it one of the most efficient parsing algorithms in terms of worst-case [[asymptotic complexity]], although other algorithms exist with better average running time in many practical scenarios.▼ ▲The importance of the CYK algorithm stems from its high efficiency in certain situations. Using [[Big O notation\|big ''O'' notation]], the [[Analysis of algorithms\|worst case running time]] of CYK is <math>\mathcal{O}\left( n^3 \cdot \left\| G \right\| \right)</math>, where <math>n</math> is the length of the parsed string and <math>\left\| G \right\|</math> is the size of the CNF grammar <math>G</math> {{harv\|Hopcroft\|Ullman\|1979\|p=140}}. This makes it one of the most efficient {{Citation needed\|reason=cubic time does not seem efficient at all; other algorithms claim linear execution time\|date=August 2023}} parsing algorithms in terms of worst-case [[asymptotic complexity]], although other algorithms exist with better average running time in many practical scenarios. ==Standard form== The [[dynamic programming]] algorithm requires the context-free grammar to be rendered into [[Chomsky normal form]] (CNF), because it tests for possibilities to split the current sequence into two smaller sequences. Any context-free grammar that does not generate the empty string can be represented in CNF using only [[Formal grammar#The syntax of grammars\|production rules]] of the forms <math>A\rightarrow \alpha</math> and <math>A\rightarrow B C</math>.; to allow for the empty string, one can explicitly allow <math>S\to \varepsilon</math>, where <math>S</math> is the start symbol.<ref>{{~~Citation~~Cite book \|last=Sipser \|first=Michael \|title=Introduction to the theory of computation ~~needed~~\|date=~~September~~2006 ~~2021~~\|publisher=Thomson Course Technology \|isbn=0-534-95097-3 \|edition=2nd \|___location=Boston \|at=Definition 2.8 \|oclc=58544333}}</ref> ==Algorithm== Line 17 ⟶ 28: '''let''' the grammar contain ''r'' nonterminal symbols ''R''<sub>1</sub> ... ''R''<sub>''r''</sub>, with start symbol ''R''<sub>1</sub>. '''let''' ''P''[''n'',''n'',''r''] be an array of booleans. Initialize all elements of ''P'' to false. '''let''' ''back''[''n'',''n'',''r''] be an array of lists of backpointing triples. Initialize all elements of ''back'' to the empty list. '''for each''' ''s'' = 1 to ''n'' Line 26 ⟶ 38: '''for each''' ''p'' = 1 to ''l''-1 ''-- Partition of span'' '''for each''' production ''R''<sub>''a''</sub> → ''R''<sub>''b''</sub> ''R''<sub>''c''</sub> '''if''' ''P''[''p'',''s'',''b''] and ''P''[''l''-''p'',''s''+''p'',''c''] '''then''' '''set''' ''P''[''l'',''s'',''a''] = true, append <p,b,c> to ''back''[''l'',''s'',''a''] '''if''' ''P''[n,''1'',''1''] is true '''then''' ''I'' is member of language '''return''' ''back'' -- by ''retracing the steps through back, one can easily construct all possible parse trees of the string.'' '''else''' ''I'return''' is "not a member of language" <div class="toccolours mw-collapsible mw-collapsed"> Line 51 ⟶ 66: '''for each''' production ''R''<sub>''a''</sub> → ''R''<sub>''b''</sub> ''R''<sub>''c''</sub> prob_splitting = Pr(''R''<sub>''a''</sub> →''R''<sub>''b''</sub> ''R''<sub>''c''</sub>) * ''P''[''p'',''s'',''b''] * ''P''[''l''-''p'',''s''+''p'',''c''] '''if''' ~~''P''[''p'',''s'',''b'']~~prob_splitting > ~~0 and ''P''[''l''-''p'',''s''+''p'',''c''] > 0 and~~ ''P''[''l'',''s'',''a''] ~~< prob_splitting~~ '''then''' '''set''' ''P''[''l'',''s'',''a''] = prob_splitting '''set''' ''back''[''l'',''s'',''a''] = <p,b,c> '''if''' ''P''[n,''1'',''1''] > 0 '''then''' find the parse tree by retracing through ''back'' '''return''' the parse tree '''else''' '''return''' "not a member of language" </div> </div> Line 116 ⟶ 137: ===Parsing weighted context-free grammars=== It is also possible to extend the CYK algorithm to parse strings using [[weighted context-free grammar\|weighted]] and [[stochastic context-free grammar]]s. Weights (probabilities) are then stored in the table P instead of booleans, so P[i,j,A] will contain the minimum weight (maximum probability) that the substring from i to j can be derived from A. Further extensions of the algorithm allow all parses of a string to be enumerated from lowest to highest weight (highest to lowest probability). ==== Numerical stability ==== When the probabilistic CYK algorithm is applied to a long string, the splitting probability can become very small due to multiplying many probabilities together. This can be dealt with by summing log-probability instead of multiplying probabilities. ===Valiant's algorithm=== Line 134 ⟶ 158: == Sources == {{cite conference \|title= Syntax in universal translation \|last= Sakai \|first= Itiroo \|date= 1962 \|___location= London \|publisher= Her Majesty’s Stationery Office \|volume= II \|pages= 593–608 \|conference= 1961 International Conference on Machine Translation of Languages and Applied Language Analysis, Teddington, England}} {{cite ~~techreport~~tech report \|last1=Cocke \|first1=John \|author-link1=John Cocke (computer scientist) \|last2=Schwartz \|first2=Jacob T. \|date=April 1970 \|title=Programming languages and their compilers: Preliminary notes \|edition=2nd revised \|publisher=[[Courant Institute of Mathematical Sciences\|CIMS]], [[New York University\|NYU]] \|url=http://www.softwarepreservation.org/projects/FORTRAN/CockeSchwartz_ProgLangCompilers.pdf}} * {{cite book \| isbn=0-201-02988-X \| first1=John E. \| last1=Hopcroft \| author1-link=John E. Hopcroft \| first2=Jeffrey D. \| last2=Ullman \| author2-link=Jeffrey D. Ullman \| title=Introduction to Automata Theory, Languages, and Computation \| ___location=Reading/MA \| publisher=Addison-Wesley \| year=1979 \| url=https://archive.org/details/introductiontoau00hopc }} {{cite ~~techreport~~tech report \|last1=Kasami \|first1=T. \|author-link1=Tadao Kasami \|year=1965 \|title=An efficient recognition and syntax-analysis algorithm for context-free languages \|number=65-758 \|publisher=[[Air Force Cambridge Research Laboratories\|AFCRL]]}} {{cite book \|last1=Knuth \|first1=Donald E. \|author-link1=Donald Knuth \|title=The Art of Computer Programming Volume 2: Seminumerical Algorithms \|publisher=Addison-Wesley Professional \|edition=3rd \|date=November 14, 1997 \|isbn=0-201-89684-2 \|pages=501 }} {{cite journal \|last1=Lang \|first1=Bernard \|title=Recognition can be harder than parsing \|journal=[[Computational Intelligence (journal)\|Comput. Intell.]] \|year=1994 \|volume=10 \|issue=4 \|pages=486–494 \|citeseerx=10.1.1.50.6982 \|doi=10.1111/j.1467-8640.1994.tb00011.x \|s2cid=5873640 }} Line 146 ⟶ 170: ==External links== [https://raw.org/tool/cyk-algorithm/ Interactive Visualization of the CYK algorithm] * [https://martinlaz.github.io/demos/cky.html CYK parsing demo in JavaScript] * [~~http~~https://www.swisseduc.ch/~~compscience~~informatik/exorciser/ Exorciser is a Java application to generate exercises in the CYK algorithm as well as Finite State Machines, Markov algorithms etc] {{Parsers}}