Content deleted Content added
No edit summary |
No edit summary |
||
Line 13:
Let be a state space (finite alphabet) of size ||. Consider a sequence with the Markov property of n realizations of random variables, where is the state (symbol) at position i 1in, and the concatenation of states and is denoted by . Given a training set of observed states, , the construction algorithm of the VOM models learns a model that provides a probability assignment for each state in the sequence given its past (previously observed symbols) or future states. Specifically, the learner generates a conditional probability distribution for a symbol given a context , where the * sign represents a sequence of states of any length, including the empty context. VOM models attempt to estimate conditional distributions of the form where the context length varies depending on the available statistics. In contrast, conventional Markov models attempt to estimate these conditional distributions by assuming a fixed contexts' length and, hence, can be considered as special cases of the VOM models. Effectively, for a given training sequence, the VOM models are found to obtain better model parameterization than the fixed-order Markov Models (Ben-Gal et al, 2005) that leads to a better variance-bias tradeoff of the learned models.
==Example==
Consider for example a sequence of random variables each of which takes value from the ternary state space {a,b,c}.
To construct the Markov chain of order 1 for the next state in this sequence, one needs to estimate the following 9 conditional probability components {Pr(a|a), Pr(a|b), Pr(a|c), Pr(b|a), Pr(b|a), Pr(b|a), Pr(c|a), Pr(c|a), Pr(c|a)}.
Line 22:
Consider for example the string aaabcaaabcaaabcaaabc…aaabc constructed from infinite concatenations of the sub-string aaabc. The VOM model of maximal order 2 can approximate the string using only the following four conditional probability components {Pr(a|aa)=0.5, Pr(b|aa)=0.5, Pr(c|b)=1.0, Pr(a|c)= 1.0}. In this example, Pr(c|ab)=Pr(c|b)=1.0, therefore, the shorter context b is sufficient to determine the future state. Similarly, the VOM model of maximal order 3 can approximate the string using only four conditional probability components.
==Application Areas==
Various efficient algorithms were devised for estimating the parameters of the VOM model [3]. The VOM models were successfully applied to areas such as Machine learning, Information theory and Bioinformatics including specific applications such as coding and data compression [1] document compression [3], classification and identification of DNA and protein sequences [2] Statistical Process Control [4] and more.
==See Also==
• Markov Chains
• Examples of Markov chains
• Markov process
• Markov Chain Monte Carlo
==References==
[1] Rissanen J. (1983). “A Universal Data Compression System”. IEEE Transactions on Information Theory. 29 (5):656- 664.
|