Variable-order Markov model: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 09:27, 21 April 2007 edit 85.64.213.129 (talk) No edit summary ← Previous edit		Latest revision as of 16:20, 27 August 2025 edit undo Citation bot (talk \| contribs) Bots 5,866,407 edits Removed URL that duplicated identifier. \| Use this bot. Report bugs. \| Suggested by Headbomb \| Linked from Wikipedia:WikiProject_Academic_Journals/Journals_cited_by_Wikipedia/Sandbox \| #UCB_webform_linked 818/967
(83 intermediate revisions by 39 users not shown)
Line 1: {{Short description\|Markov-based processes with variable "memory"}} ~~{{Wikify\|date=April 2007}}~~ In the mathematical theory of [[stochastic processes]], '''~~Variable~~ variable-order Markov (VOM) models''' ~~Models~~ are an important class of models that extend the well known [[Markov ~~Chain~~chain]] ~~Models~~models. In contrast to the [[Markov ~~Chain]]~~chain ~~Models~~models, where each [[random variable ]] in a sequence with a [[Markov property]] depends on a fixed number of [[random ~~variable]]s~~variables, in VOM models this number of conditioning [[random ~~variable]]s~~variables may vary based on the specific observed realization. ▼ This realization sequence is often called the ''context''; therefore the VOM models are also called ''context trees''.<ref name="Rissanen">{{cite journal\|last = Rissanen\|first = J.\|title = A Universal Data Compression System\|journal = IEEE Transactions on Information Theory\|volume = 29\|issue = 5\|date = Sep 1983\|pages = 656–664\|doi = 10.1109/TIT.1983.1056741}}</ref> VOM models are nicely rendered by colorized probabilistic suffix trees (PST).<ref name=":0">{{Cite journal\|last1=Gabadinho\|first1=Alexis\|last2=Ritschard\|first2=Gilbert\|date=2016\|title=Analyzing State Sequences with Probabilistic Suffix Trees: The PST R Package\|url=http://www.jstatsoft.org/v72/i03/\|journal=Journal of Statistical Software\|language=en\|volume=72\|issue=3\|doi=10.18637/jss.v072.i03\|s2cid=63681202 \|issn=1548-7660\|doi-access=free}}</ref> The flexibility in the number of conditioning random variables turns out to be of real advantage for many applications, such as [[statistical analysis]], [[Statistical classification\|classification]] and [[prediction]].<ref name="Shmilovici">{{cite journal\|last = Shmilovici\|first = A.\|author2=Ben-Gal, I. \|title = Using a VOM Model for Reconstructing Potential Coding Regions in EST Sequences\|journal = Computational Statistics\|volume = 22\|issue = 1\|year = 2007\|pages = 49–69\|doi = 10.1007/s00180-007-0021-8\| s2cid=2737235 }}</ref><ref name="Begleiter">{{cite journal\|last = Begleiter\|first = R.\|author2 = El-Yaniv, R.\|author3 = Yona, G.\|title = On Prediction Using Variable Order Markov models\|journal = Journal of Artificial Intelligence Research\|volume = 22\|year = 2004\|pages = 385–421\|doi = 10.1613/jair.1491\|doi-access = free\|arxiv = 1107.0051}}</ref><ref name="Ben-Gal">{{cite journal\|last=Ben-Gal\|first=I.\|author2=Morag, G.\|author3=Shmilovici, A.\|year=2003\|title=Context-Based Statistical Process Control: A Monitoring Procedure for State-Dependent Processes\|url=http://www.eng.tau.ac.il/~bengal/Technometrics_final.pdf\|journal=Technometrics\|volume=45\|issue=4\|pages=293–311\|doi=10.1198/004017003000000122\|s2cid=5227793 \|issn=0040-1706}}</ref> ▲'''Variable order Markov (VOM)''' Models are an important class of models that extend the well known [[Markov Chain]] Models. In contrast to the [[Markov Chain]] Models, where each random variable in a sequence with a [[Markov property]] depends on a fixed number of [[random variable]]s, in VOM models this number of conditioning [[random variable]]s may vary based on the specific observed realization. ==Example== This realization sequence is often called the ''context'' and thus the VOM models are also called ''Context Trees'' [1]. The flexibility in the number of conditioning [[random variable]]s turns out to be of real advantage for many applications, such as statistical analysis, classification and prediction. Consider for example a sequence of [[random variable]]s, each of which takes a value from the ternary [[alphabet]] {{math\|{{mset\|''a'', ''b'', ''c''}}}}. Specifically, consider the string constructed from infinite concatenations of the sub-string {{math\|''aaabc''}}: {{math\|''aaabcaaabcaaabcaaabc…aaabc''}}.▼ ~~==Definition==~~ Let <math>A</math> be a state space (finite [[alphabet]]) of size <nowiki>\|A\|</nowiki>. ▼ The VOM model of maximal order 2 can approximate the above string using ''only'' the following ~~four~~five [[conditional probability]] components: {{math\|Pr(''a\|'' {{!}} ''aa'') {{=}} 0.5}}, {{math\|Pr(''b\|'' {{!}} ''aa'') {{=}} 0.5}}, {{math\|Pr(''c\|'' {{!}} ''b'') {{=}} 1.0}}, {{math\|Pr(''a\|'' {{!}} ''c''){{=}} 1.0}.}, {{math\|Pr(''a'' {{!}} ''ca'') {{=}} 1.0}}.▼ Consider a sequence with the [[Markov chain\|Markov property]] <math>x_1^{n}=x_1x_2...x_n</math> of <math>n</math> realizations of [[random variable]]s, where <math> x_i\in A</math> is the state (symbol) at position <math>i</math> 1≤<math>i</math>≤<math>n</math>, and the concatenation of states <math>x_i</math> and <math>x_{i+1}</math> is denoted by <math>x_ix_{i+1}</math>.▼ In this example, {{math\|Pr(''c\|'' {{!}} ''ab'') {{=}} Pr(''c\|'' {{!}} ''b'') {{=}} 1.0,}}; therefore, the shorter context {{math\|''b''}} is sufficient to determine the ~~future~~next character. Similarly, the VOM model of maximal order 3 can ~~approximate~~generate the string exactly using only ~~four~~five [[conditional probability]] components, which are all equal to 1.0.▼ Given a training set of observed states, <math>x_1^{n}</math>, the construction algorithm of the VOM models [2,3,4] learns a model <math>P</math> that provides a [[probability]] assignment for each state in the sequence given its past (previously observed symbols) or future states.▼ To construct the [[Markov chain]] of order 1 for the next character in that string, one must estimate the following 9 conditional probability components: {{math\|Pr(''a'' {{!}} ''a'')}}, {{math\|Pr(''a'' {{!}} ''b'')}}, {{math\|Pr(''a'' {{!}} ''c'')}}, {{math\|Pr(''b'' {{!}} ''a'')}}, {{math\|Pr(''b'' {{!}} ''b'')}}, {{math\|Pr(''b'' {{!}} ''c'')}}, {{math\|Pr(''c'' {{!}} ''a'')}}, {{math\|Pr(''c'' {{!}} ''b'')}}, {{math\|Pr(''c'' {{!}} ''c'')}}. To construct the Markov chain of order 2 for the next character, one must estimate 27 conditional probability components: {{math\|Pr(''a'' {{!}} ''aa'')}}, {{math\|Pr(''a'' {{!}} ''ab'')}}, {{math\|…}}, {{math\|Pr(''c'' {{!}} ''cc'')}}. And to construct the Markov chain of order three for the next character one must estimate the following 81 conditional probability components: {{math\|Pr(''a'' {{!}} ''aaa'')}}, {{math\|Pr(''a'' {{!}} ''aab'')}}, {{math\|…}}, {{math\|Pr(''c'' {{!}} ''ccc'')}}. Specifically, the learner generates a [[conditional distribution\|conditional probability distribution]] <math>P(x_i\|s)</math> for a symbol <math>x_i \in A</math> given a context <math>s\in A^</math>, where the sign represents a sequence of states of any length, including the empty context. ▼ In practical settings there is seldom sufficient data to accurately estimate the [[exponential growth\|~~exponential~~exponentially ~~growing~~increasing]] number of [[conditional probability]] components as the order of the [[Markov chain]] increases.▼ VOM models attempt to estimate [[conditional distribution]]s of the form <math>P(x_i\|s)</math> where the context length \|<math>s</math>\|≤<math>D</math> varies depending on the available statistics. ▼ In contrast, conventional [[Markov chain\|Markov models]] attempt to estimate these [[conditional distribution]]s by assuming a fixed contexts' length \|<math>s</math>\|=<math>D</math> and, hence, can be considered as special cases of the VOM models. ▼ The ~~Variable Order~~variable-order Markov model assumes that in realistic settings, there are certain realizations of states (represented by contexts) in which some past states are independent from the future states,; accordingly, ''"a great reduction in the number of model parameters can be achieved''."<ref name="Rissanen"/>▼ Effectively, for a given training sequence, the VOM models are found to obtain better model parameterization than the fixed-order [[Markov chain\|Markov Models]] that leads to a better [[variance]]-bias tradeoff of the learned models [2,3,4].▼ ==~~Example~~Definition== ▲Let ~~<math>~~{{mvar\|A~~</math>~~}} be a state space (finite [[Alphabet (formal languages)\|alphabet]]) of size <~~nowiki~~math>\|A\|</~~nowiki~~math>. ▲Consider for example a sequence of [[random variable]]s each of which takes value from the ternary [[alphabet]] {''a,b,c''}. ▲Consider a sequence with the [[~~Markov chain\|~~Markov property]] <math>x_1^{n}=x_1x_2~~...~~\dots x_n</math> of ~~<math>~~{{mvar\|n~~</math>~~}} realizations of [[random variable]]s, where <math> x_i\in A</math> is the state (symbol) at position ~~<math>~~{{mvar\|i~~</math>~~}} 1≤<math>\scriptstyle (1 \le i~~</math>≤<math>~~ \le n)</math>, and the concatenation of states <math>x_i</math> and <math>x_{i+1}</math> is denoted by <math>x_ix_{i+1}</math>. ~~Consider for example the string ''aaabcaaabcaaabcaaabc…aaabc'' constructed from infinite concatenations of the sub-string ''aaabc''.~~ ▲The VOM model of maximal order 2 can approximate the above string using ''only'' the following four [[conditional probability]] components {Pr(a\|aa)=0.5, Pr(b\|aa)=0.5, Pr(c\|b)=1.0, Pr(a\|c)= 1.0}. ▲In this example, Pr(c\|ab)=Pr(c\|b)=1.0, therefore, the shorter context ''b'' is sufficient to determine the future character. Similarly, the VOM model of maximal order 3 can approximate the string using only four [[conditional probability]] components. ▲Given a training set of observed states, <math>x_1^{n}</math>, the construction algorithm of the VOM models<ref ~~[2,3,4]~~name="Shmilovici"/><ref name="Begleiter"/><ref name="Ben-Gal"/> learns a model ~~<math>~~{{mvar\|P~~</math>~~}} that provides a [[probability]] assignment for each state in the sequence given its past (previously observed symbols) or future states. ▲Specifically, the learner generates a [[conditional distribution\|conditional probability distribution]] <math>P(x_i\|\mid s)</math> for a symbol <math>x_i \in A</math> given a context <math>s\in A^</math>, where the sign represents a sequence of states of any length, including the empty context. To construct the [[Markov chain]] of order 1 for the next character in this sequence, one needs to estimate the following 9 [[conditional probability]] components {Pr(a\|a), Pr(a\|b), Pr(a\|c), Pr(b\|a), Pr(b\|a), Pr(b\|a), Pr(c\|a), Pr(c\|a), Pr(c\|a)}. ▲VOM models attempt to estimate [[conditional distribution]]s of the form <math>P(x_i\|\mid s)</math> where the context length \|<math>\|s~~</math>~~\|~~≤<math>~~ \le D</math> varies depending on the available statistics. ~~To construct the [[Markov chain]] of order 2 for the next character in this sequence, one needs to estimate the following 27 [[conditional probability]] components {Pr(a\|aa), Pr(a\|ab), …, Pr(c\|cc)}.~~ ▲In contrast, conventional [[Markov chain\|Markov models]] attempt to estimate these [[conditional distribution]]s by assuming a fixed contexts' length \|<math>\|s~~</math>~~\| =~~<math>~~ D</math> and, hence, can be considered as special cases of the VOM models. ▲Effectively, for a given training sequence, the VOM models are found to obtain better model parameterization than the fixed-order [[Markov chain\|Markov ~~Models~~models]] that leads to a better [[variance]]-bias tradeoff of the learned models ~~[2,3,4]~~.<ref name="Shmilovici"/><ref name="Begleiter"/><ref name="Ben-Gal"/> To construct the [[Markov chain]] of order three for the next character in this sequence, one needs to estimate the following 81 [[conditional probability]] components {Pr(a\|aaa), Pr(a\|aab), …, Pr(c\|ccc)}. ==Application ~~Areas~~areas==▼ Various efficient algorithms ~~were~~have been devised for estimating the parameters of the VOM model ~~[3]~~.<ref name="Begleiter"/>▼ VOM models have been successfully applied to areas such as [[machine learning]], [[information theory]] and [[bioinformatics]], including specific applications such as [[code\|coding]] and [[data compression]],<ref name="Rissanen"/> document compression,<ref name="Begleiter"/> classification and identification of [[DNA]] and [[protein\|protein sequences]],<ref>{{cite journal \|url= http://www.eng.tau.ac.il/~bengal/VOMBAT.pdf \|title= VOMBAT: Prediction of Transcription Factor Binding Sites using Variable Order Bayesian Trees \|author1= Grau J. \|author2= Ben-Gal I. \|author3= Posch S. \|author4= Grosse I. \|journal= Nucleic Acids Research \|publisher= Nucleic Acids Research, vol. 34, issue W529–W533. \|year= 2006 \|volume= 34 \|issue= Web Server issue \|pages= W529-33 \|doi= 10.1093/nar/gkl212 \|pmid= 16845064 \|pmc= 1538886 \|archive-date= 2018-09-30 \|access-date= 2014-01-10 \|archive-url= https://web.archive.org/web/20180930084306/http://www.eng.tau.ac.il/~bengal/VOMBAT.pdf \|url-status= dead }}</ref> [http://www.eng.tau.ac.il/~bengal/VOMBAT.pdf]<ref name="Shmilovici"/> [[statistical process control]],<ref name="Ben-Gal"/> [[spam filtering]],<ref name="Bratko">{{cite journal\|last = Bratko\|first = A. \|author2=Cormack, G. V. \|author3=Filipic, B. \|author4=Lynam, T. \|author5=Zupan, B.\|title = Spam Filtering Using Statistical Data Compression Models\|journal = Journal of Machine Learning Research\|volume = 7\|year = 2006\|pages = 2673–2698\|url = http://www.jmlr.org/papers/volume7/bratko06a/bratko06a.pdf}}</ref> [[haplotyping]],<ref>[[Sharon R. Browning\|Browning, Sharon R.]] "Multilocus association mapping using variable-length Markov chains." The American Journal of Human Genetics 78.6 (2006): 903–913.</ref> speech recognition,<ref>{{Cite book\|last1=Smith\|first1=A.\|last2=Denenberg\|first2=J.\|last3=Slack\|first3=T.\|last4=Tan\|first4=C.\|last5=Wohlford\|first5=R.\|title=ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing \|chapter=Application of a sequential pattern learning system to connected speech recognition \|date=1985\|___location=Tampa, FL, USA\|publisher=Institute of Electrical and Electronics Engineers\|volume=10\|pages=1201–1204\|doi=10.1109/ICASSP.1985.1168282\|s2cid=60991068 }}</ref> [[sequence analysis in social sciences]],<ref name=":0" /> and others. ▲In practical settings there is seldom sufficient data to accurately estimate the [[exponential growth\|exponential growing]] number of [[conditional probability]] components as the order of the [[Markov chain]] increases. ▲The Variable Order Markov model assumes that in realistic settings, there are certain realizations of states (represented by contexts) in which some past states are independent from the future states, accordingly, ''a great reduction in the number of model parameters can be achieved''. ▲==Application Areas== ▲Various efficient algorithms were devised for estimating the parameters of the VOM model [3]. VOM models were successfully applied to areas such as [[Machine learning]], [[Information theory]] and [[Bioinformatics]] including specific applications such as [[code\|coding]] and [[data compression]] [1] document compression [3], classification and identification of [[DNA]] and [[protein\|protein sequences]] [2], [[Statistical process control]] [4] and more. ==See also== * [[Stochastic chains with memory of variable length]] * [[Markov chain]] * [[Examples of Markov chains]] * [[Variable order Bayesian network]] * [[Markov process]] * [[Markov chain Monte Carlo]] * [[Semi-Markov process]] * [[Artificial intelligence]] ==References== {{reflist}} [[Category:~~Statistics~~Markov models]]▼ ~~[1] Rissanen J. (1983). “A Universal Data Compression System”. IEEE Transactions on Information Theory, vol. 29, no. 5, pp.656-664.<br />~~ ~~[2] Shmilovici A., Ben-Gal I. (2007), “Using a VOM Model for Reconstructing Potential Coding Regions in EST Sequences”, Computational Statistics vol. 22, no. 1, 49-69.<br />~~ ~~[3] Begleiter R., El-Yaniv R., Yona G., (2004), "On Prediction Using Variable Order Markov Models", Journal of Artificial Intelligence, vol. 22, pp. 385-421.<br />~~ ~~[4] Ben-Gal I., Morag G., Shmilovici A., (2003) "CSPC: A Monitoring Procedure for State Dependent Processes", Technometrics, vol. 45, no. 4, 293-311.~~ ~~[[Category:Probability theory]]~~ ~~[[Category:Stochastic processes]]~~ ~~[[Category:Machine learning]]~~ ▲[[Category:Statistics]]