Variable-order Markov model: Difference between revisions

Content deleted Content added
No edit summary
No edit summary
Line 7:
Consider for example a sequence of [[random variable]]s, each of which takes a value from the ternary [[alphabet]] {''a'', ''b'', ''c''}. Specifically, consider the string ''aaabcaaabcaaabcaaabc...aaabc'' constructed from infinite concatenations of the sub-string ''aaabc''.
 
The VOM model of maximal order 2 can approximate the above string using ''only'' the following five [[conditional probability]] components: {Pr(''a'' | ''aa'') = 0.5, Pr(''b'' | ''aa'') = 0.5, Pr(''c'' | ''b'') = 1.0, Pr(''a'' | ''c'')= 1.0, Pr(''a'' | ''ca'') = 1.0}.
 
In this example, Pr(''c''|''ab'') = Pr(''c''|''b'') = 1.0; therefore, the shorter context ''b'' is sufficient to determine the next character. Similarly, the VOM model of maximal order 3 can generate the string exactly using only five conditional probability components, which are all equal to 1.0.
To construct the [[Markov chain]] of order 1 for the next character in that string, one must estimate the following 9 conditional probability components: {Pr(''a'' | ''a''), Pr(''a'' | ''b''), Pr(''a'' | ''c''), Pr(''b'' | ''a''), Pr(''b'' | ''b''), Pr(''b'' | ''c''), Pr(''c'' | ''a''), Pr(''c'' | ''b''), Pr(''c'' | ''c'')}. To construct the Markov chain of order 2 for the next character, one must estimate 27 conditional probability components: {Pr(''a'' | ''aa''), Pr(''a'' | ''ab''), ..., Pr(''c'' | ''cc'')}. And to construct the Markov chain of order three for the next character one must estimate the following 81 conditional probability components: {Pr(''a'' | ''aaa''), Pr(''a'' | ''aab''), ..., Pr(''c'' | ''ccc'')}.
 
In practical settings there is seldom sufficient data to accurately estimate the [[exponential growth|exponentially increasing]] number of conditional probability components as the order of the Markov chain increases.
Line 24:
Given a training set of observed states, <math>x_1^{n}</math>, the construction algorithm of the VOM models<ref name="Shmilovici"/><ref name="Begleiter"/><ref name="Ben-Gal"/> learns a model {{mvar|P}} that provides a [[probability]] assignment for each state in the sequence given its past (previously observed symbols) or future states.
Specifically, the learner generates a [[conditional distribution|conditional probability distribution]] <math>P(x_i|\mid s)</math> for a symbol <math>x_i \in A</math> given a context <math>s\in A^*</math>, where the * sign represents a sequence of states of any length, including the empty context.
 
VOM models attempt to estimate [[conditional distribution]]s of the form <math>P(x_i|s\mids)</math> where the context length <math>|s| \le D</math> varies depending on the available statistics.
In contrast, conventional [[Markov chain|Markov models]] attempt to estimate these [[conditional distribution]]s by assuming a fixed contexts' length <math>|s| = D</math> and, hence, can be considered as special cases of the VOM models.