Variable-order Markov model: Difference between revisions

Browse history interactively

← Previous edit Next edit →

Content deleted Content added

VisualWikitext

Revision as of 05:20, 25 July 2014 edit Monkbot (talk \| contribs) Bots 3,695,952 edits m Task 5: Fix CS1 deprecated coauthor parameter errors ← Previous edit		Revision as of 09:34, 25 July 2025 edit undo InternetArchiveBot (talk \| contribs) Bots, Pending changes reviewers 5,675,262 edits Rescuing 1 sources and tagging 0 as dead.) #IABot (v2.0.9.5 Next edit →
(34 intermediate revisions by 23 users not shown)
Line 1: {{Short description\|Markov-based processes with variable "memory"}} In ~~[[stochastic~~the ~~processes]],~~mathematical atheory ~~topic in~~of [[~~mathematics~~stochastic processes]], '''~~Variable~~variable-order Markov (VOM) models''' are an important class of models that extend the well known [[Markov chain]] models. In contrast to the Markov chain models, where each [[random variable]] in a sequence with a [[Markov property]] depends on a fixed number of random variables, in VOM models this number of conditioning random variables may vary based on the specific observed realization. This realization sequence is often called the ''context''; therefore the VOM models are also called ''context trees''.<ref name="Rissanen">{{cite journal\|last = Rissanen\|first = J.\|title = A Universal Data Compression System\|journal = IEEE Transactions on Information Theory\|volume = 29\|issue = 5\|date = Sep 1983\|pages = 656–664\|~~url~~doi = 10.1109/TIT.1983.1056741}}</ref> VOM models are nicely rendered by colorized probabilistic suffix trees (PST).<ref name=":0">{{Cite journal\|last1=Gabadinho\|first1=Alexis\|last2=Ritschard\|first2=Gilbert\|date=2016\|title=Analyzing State Sequences with Probabilistic Suffix Trees: The PST R Package\|url=http://~~ieeexplore~~www.~~ieee~~jstatsoft.org/~~xpl~~v72/~~freeabs_all.jsp?isnumber~~i03/\|journal=~~22734&arnumber~~Journal of Statistical Software\|language=~~1056741~~en\|volume=72\|issue=3\|doi = 10.~~1109~~18637/~~TIT~~jss.~~1983~~v072.~~1056741~~i03\|s2cid=63681202 \|issn=1548-7660\|doi-access=free}}</ref> The flexibility in the number of conditioning random variables turns out to be of real advantage for many applications, such as [[statistical analysis]], [[Statistical classification\|classification]] and [[prediction]].<ref name="Shmilovici">{{cite journal\|last = Shmilovici\|first = A.\|author2=Ben-Gal, I. \|title = Using a VOM Model for Reconstructing Potential Coding Regions in EST Sequences\|journal = Computational Statistics\|volume = 22\|issue = 1\|year = 2007\|pages = 49–69~~\|url=http://www.springerlink.com/content/a447865604519210/~~\|doi = 10.1007/s00180-007-0021-8\| s2cid=2737235 }}</ref><ref name="Begleiter">{{cite journal\|last = Begleiter\|first = R.\|~~coauthors~~author2 = El-Yaniv, R.\|author3 ~~and~~= Yona, G.\|title = On Prediction Using Variable Order Markov models\|journal = Journal of Artificial Intelligence Research\|volume = 22\|year = 2004\|pages = 385–421\|~~url~~doi = ~~http:~~10.1613/~~/www.~~jair.~~org/media/~~1491~~/live~~\|doi-~~1491-2335-jair~~access = free\|arxiv = 1107.~~pdf~~0051}}</ref><ref name="Ben-Gal">{{cite journal\|last = Ben-Gal\|first = I.\|~~coauthors~~ author2= Morag, G. ~~and~~ \|author3=Shmilovici, A.\|year=2003\|title =Context-Based ~~CSPC~~Statistical Process Control: A Monitoring Procedure for State -Dependent Processes~~\|journal = Technometrics\|volume = 45\|issue = 4\|year = 2003\|pages = 293–311~~\|url = http://www.eng.tau.ac.il/~bengal/Technometrics_final.pdf\|journal=Technometrics\|volume=45\|issue=4\|pages=293–311\|doi = 10.1198/004017003000000122\|s2cid=5227793 \|issn=0040-1706}}</ref> ==Example== Consider for example a sequence of [[random variable]]s, each of which takes a value from the ternary [[alphabet]] {{math\|{{mset\|''a'',~~ ~~ ''b'',~~ ~~ ''c''}}}}. Specifically, consider the string ~~''aaabcaaabcaaabcaaabc...aaabc''~~ constructed from infinite concatenations of the sub-string {{math\|''aaabc''}}: {{math\|''aaabcaaabcaaabcaaabc…aaabc''}}. The VOM model of maximal order 2 can approximate the above string using ''only'' the following five [[conditional probability]] components: {{math\|Pr(''a''\| {{!}} ''aa'') {{=}} 0.5}}, {{math\|Pr(''b''\| {{!}} ''aa'') {{=}} 0.5}}, {{math\|Pr(''c''\| {{!}} ''b'') {{=}} 1.0}}, {{math\|Pr(''a''\| {{!}} ''c''){{=}} 1.0}}, {{math\|Pr(''a''\| {{!}} ''ca'') {{=}} 1.0}}. In this example, {{math\|Pr(''c''\| {{!}} ''ab'') {{=}} Pr(''c''\| {{!}} ''b'') {{=}} 1.0}}; therefore, the shorter context {{math\|''b''}} is sufficient to determine the next character. Similarly, the VOM model of maximal order 3 can generate the string exactly using only five conditional probability components, which are all equal to 1.0. To construct the [[Markov chain]] of order 1 for the next character in that string, one must estimate the following 9 conditional probability components: {{math\|Pr(''a''\| {{!}} ''a'')}}, {{math\|Pr(''a''\| {{!}} ''b'')}}, {{math\|Pr(''a''\| {{!}} ''c'')}}, {{math\|Pr(''b''\| {{!}} ''a'')}}, {{math\|Pr(''b''\| {{!}} ''b'')}}, {{math\|Pr(''b''\| {{!}} ''c'')}}, {{math\|Pr(''c''\| {{!}} ''a'')}}, {{math\|Pr(''c''\| {{!}} ''b'')}}, {{math\|Pr(''c''\| {{!}} ''c'')}}. To construct the Markov chain of order 2 for the next character, one must estimate 27 conditional probability components: {{math\|Pr(''a''\| {{!}} ''aa'')}}, {{math\|Pr(''a''\| {{!}} ''ab'')}}, ~~...~~{{math\|…}}, {{math\|Pr(''c''\| {{!}} ''cc'')}}. And to construct the Markov chain of order three for the next character one must estimate the following 81 conditional probability components: {{math\|Pr(''a''\| {{!}} ''aaa'')}}, {{math\|Pr(''a''\| {{!}} ''aab'')}}, ~~...~~{{math\|…}}, {{math\|Pr(''c''\| {{!}} ''ccc'')}}. In practical settings there is seldom sufficient data to accurately estimate the [[exponential growth\|exponentially increasing]] number of conditional probability components as the order of the Markov chain increases. Line 18 ⟶ 19: ==Definition== Let ~~<math>~~{{mvar\|A~~</math>~~}} be a state space (finite [[Alphabet (formal languages)\|alphabet]]) of size <~~nowiki~~math>\|A\|</~~nowiki~~math>. Consider a sequence with the [[Markov property]] <math>x_1^{n}=x_1x_2\dots x_n</math> of ~~<math>~~{{mvar\|n~~</math>~~}} realizations of [[random variable]]s, where <math> x_i\in A</math> is the state (symbol) at position ~~<math>~~{{mvar\|i}} </math>\scriptstyle (1 \le ~~1≤<math>~~i~~</math>≤<math>~~ \le n)</math>, and the concatenation of states <math>x_i</math> and <math>x_{i+1}</math> is denoted by <math>x_ix_{i+1}</math>. Given a training set of observed states, <math>x_1^{n}</math>, the construction algorithm of the VOM models<ref name="Shmilovici"/><ref name="Begleiter"/><ref name="Ben-Gal"/> learns a model ~~<math>~~{{mvar\|P~~</math>~~}} that provides a [[probability]] assignment for each state in the sequence given its past (previously observed symbols) or future states. Specifically, the learner generates a [[conditional distribution\|conditional probability distribution]] <math>P(x_i\|\mid s)</math> for a symbol <math>x_i \in A</math> given a context <math>s\in A^</math>, where the sign represents a sequence of states of any length, including the empty context. VOM models attempt to estimate [[conditional distribution]]s of the form <math>P(x_i\|\mid s)</math> where the context length \|<math>\|s~~</math>~~\|~~≤<math>~~ \le D</math> varies depending on the available statistics. In contrast, conventional [[Markov chain\|Markov models]] attempt to estimate these [[conditional distribution]]s by assuming a fixed contexts' length \|<math>\|s~~</math>~~\| =~~<math>~~ D</math> and, hence, can be considered as special cases of the VOM models. Effectively, for a given training sequence, the VOM models are found to obtain better model parameterization than the fixed-order [[Markov chain\|Markov models]] that leads to a better [[variance]]-bias tradeoff of the learned models.<ref name="Shmilovici"/><ref name="Begleiter"/><ref name="Ben-Gal"/> Line 34 ⟶ 35: Various efficient algorithms have been devised for estimating the parameters of the VOM model.<ref name="Begleiter"/> VOM models have been successfully applied to areas such as [[machine learning]], [[information theory]] and [[bioinformatics]], including specific applications such as [[code\|coding]] and [[data compression]],<ref name="Rissanen"/> document compression,<ref name="Begleiter"/> classification and identification of [[DNA]] and [[protein\|protein sequences]],<ref>{{cite ~~paper~~journal \|url= http://www.eng.tau.ac.il/~bengal/VOMBAT.pdf \|title= VOMBAT: Prediction of Transcription Factor Binding Sites using Variable Order Bayesian Trees, \|~~author~~author1= Grau J., \|author2= Ben-Gal I., \|author3= Posch S., \|author4= Grosse I. \|journal= Nucleic Acids Research \|publisher= Nucleic Acids Research, vol. 34, issue W529–W533. \|year= 2006 \|volume= 34 \|issue= Web Server issue \|pages= W529-33 \|doi= 10.1093/nar/gkl212 \|pmid= 16845064 \|pmc= 1538886 \|archive-date= 2018-09-30 \|access-date= 2014-01-10 \|archive-url= https://web.archive.org/web/20180930084306/http://www.eng.tau.ac.il/~bengal/VOMBAT.pdf \|url-status= dead }}</ref> [http://www.eng.tau.ac.il/~bengal/VOMBAT.pdf]<ref name="Shmilovici"/> [[statistical process control]],<ref name="Ben-Gal"/> [[spam filtering]],<ref name="Bratko">{{cite journal\|last = Bratko\|first = A. \|~~coauthors~~ author2= Cormack, G. V., \|author3=Filipic, B., \|author4=Lynam, T. ~~and~~ \|author5=Zupan, B.\|title = Spam Filtering Using Statistical Data Compression Models\|journal = Journal of Machine Learning Research\|volume = 7\|year = 2006\|pages = 2673–2698\|url = http://www.jmlr.org/papers/volume7/bratko06a/bratko06a.pdf}}</ref>, [[haplotyping]],<ref>[[Sharon R. Browning\|Browning, Sharon R.]] "Multilocus association mapping using variable-length Markov chains." The American Journal of Human Genetics 78.6 (2006): ~~903~~903–913.</ref> speech recognition,<ref>{{Cite book\|last1=Smith\|first1=A.\|last2=Denenberg\|first2=J.\|last3=Slack\|first3=T.\|last4=Tan\|first4=C.\|last5=Wohlford\|first5=R.\|title=ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing \|chapter=Application of a sequential pattern learning system to connected speech recognition \|date=1985\|chapter-~~913~~url=https://ieeexplore.ieee.org/document/1168282\|___location=Tampa, FL, USA\|publisher=Institute of Electrical and Electronics Engineers\|volume=10\|pages=1201–1204\|doi=10.1109/ICASSP.1985.1168282\|s2cid=60991068 }}</ref> [[sequence analysis in social sciences]],<ref name=":0" /> and others. ==See also== * [[Stochastic chains with memory of variable length]] * [[Examples of Markov chains]] * [[Variable order Bayesian network]]