Variable-order Markov model

Template:Wikify is deprecated. Please use a more specific cleanup template as listed in the documentation.

Variable order Markov (VOM) Models are an important class of models that extend the well known Markov Chain Models. In contrast to the Markov Chain Models, where each random variable in a sequence with a Markov property depends on a fixed number of random variables, in VOM models this number of conditioning random variables may vary based on the specific observed realization. This realization sequence is often called the context and thus the VOM models are also called Context Trees [1]. The flexibility in the number of conditioning random variables turns out to be of real advantage for many applications, such as statistical analysis, classification and prediction.

Definition

Let $A$ be a state space (finite alphabet) of size |A|.

Consider a sequence with the Markov property $x_{1}^{n}=x_{1}x_{2}...x_{n}$ of $n$ realizations of random variables, where $x_{i}\in A$ is the state (symbol) at position $i$ 1≤ $i$ ≤ $n$ , and the concatenation of states $x_{i}$ and $x_{i+1}$ is denoted by $x_{i}x_{i+1}$ .

Given a training set of observed states, $x_{1}^{n}$ , the construction algorithm of the VOM models [2,3,4] learns a model $P$ that provides a probability assignment for each state in the sequence given its past (previously observed symbols) or future states.

Specifically, the learner generates a conditional probability distribution $P(x_{i}|s)$ for a symbol $x_{i}\in A$ given a context $s\in A^{*}$ , where the * sign represents a sequence of states of any length, including the empty context.

VOM models attempt to estimate conditional distributions of the form $P(x_{i}|s)$ where the context length | $s$ |≤ $D$ varies depending on the available statistics. In contrast, conventional Markov models attempt to estimate these conditional distributions by assuming a fixed contexts' length | $s$ |= $D$ and, hence, can be considered as special cases of the VOM models.

Effectively, for a given training sequence, the VOM models are found to obtain better model parameterization than the fixed-order Markov Models that leads to a better variance-bias tradeoff of the learned models [2,3,4].

Example

Consider for example a sequence of random variables each of which takes value from the ternary alphabet {a,b,c}.

Consider for example the string aaabcaaabcaaabcaaabc…aaabc constructed from infinite concatenations of the sub-string aaabc.

The VOM model of maximal order 2 can approximate the above string using only the following four conditional probability components {Pr(a|aa)=0.5, Pr(b|aa)=0.5, Pr(c|b)=1.0, Pr(a|c)= 1.0}.

In this example, Pr(c|ab)=Pr(c|b)=1.0, therefore, the shorter context b is sufficient to determine the future character. Similarly, the VOM model of maximal order 3 can approximate the string using only four conditional probability components.

To construct the Markov chain of order 1 for the next character in this sequence, one needs to estimate the following 9 conditional probability components {Pr(a|a), Pr(a|b), Pr(a|c), Pr(b|a), Pr(b|a), Pr(b|a), Pr(c|a), Pr(c|a), Pr(c|a)}.

To construct the Markov chain of order 2 for the next character in this sequence, one needs to estimate the following 27 conditional probability components {Pr(a|aa), Pr(a|ab), …, Pr(c|cc)}.

To construct the Markov chain of order three for the next character in this sequence, one needs to estimate the following 81 conditional probability components {Pr(a|aaa), Pr(a|aab), …, Pr(c|ccc)}.

In practical settings there is seldom sufficient data to accurately estimate the exponential growing number of conditional probability components as the order of the Markov chain increases.

The Variable Order Markov model assumes that in realistic settings, there are certain realizations of states (represented by contexts) in which some past states are independent from the future states, accordingly, a great reduction in the number of model parameters can be achieved.

Application Areas

Various efficient algorithms were devised for estimating the parameters of the VOM model [3].

VOM models were successfully applied to areas such as Machine learning, Information theory and Bioinformatics including specific applications such as coding and data compression [1] document compression [3], classification and identification of DNA and protein sequences [2], Statistical process control [4] and more.

References

[1] Rissanen J. (1983). “A Universal Data Compression System”. IEEE Transactions on Information Theory. 29 (5):656- 664.
[2] Shmilovici A., Ben-Gal I. (2007), “Using a VOM Model for Reconstructing Potential Coding Regions in EST Sequences”, Computational Statistics vol. 22, no. 1, 49-69.
[3] Begleiter R., El-Yaniv R., Yona G., (2004), "On Prediction Using Variable Order Markov Models", Journal of Artificial Intelligence, 22:385-421.
[4] Ben-Gal I., Morag G., Shmilovici A., (2003) "CSPC: A Monitoring Procedure for State Dependent Processes", Technometrics, vol. 45, no. 4, 293-311.

Variable-order Markov model

Contents

Definition

Example

Application Areas

See also

References