Sequitur algorithm: Difference between revisions

Content deleted Content added
Disambiguated: digramBigram
m Method summary: clean up; HTTP→HTTPS for Github using AWB
Line 30:
 
== Method summary ==
The algorithm works by scanning a sequence of [[terminal symbol]]s and building a list of all the symbol pairs which it has read. Whenever a second occurrence of a pair is discovered, the two occurrences are replaced in the sequence by an invented [[nonterminal symbol]], the list of symbol pairs is adjusted to match the new sequence, and scanning continues. If a pair's nonterminal symbol is used only in the just created symbol's definition, the used symbol is replaced by its definition and the symbol is removed from the defined nonterminal symbols. Once the scanning has been completed, the transformed sequence can be interpreted as the top-level rule in a grammar for the original sequence. The rule definitions for the nonterminal symbols which it contains can be found in the list of symbol pairs. Those rule definitions may themselves contain additional nonterminal symbols whose rule definitions can also be read from elsewhere in the list of symbol pairs.<ref>[httphttps://github.com/GrammarViz2/grammarviz2_src GrammarViz 2.0 – Sequitur and parallel Sequitur implementations in Java, Sequitur-based time series patterns discovery]</ref>
 
==See also==