Content deleted Content added
Added spam filtering as an application area - it has been shown that VOM models originally developed for data compression can be successfully applied to spam filtering. |
→Example: corrected mistakes with probabilities and number of necessary conditional probability components |
||
Line 7:
Consider for example a sequence of [[random variable]]s, each of which takes a value from the ternary [[alphabet]] {''a'', ''b'', ''c''}. Specifically, consider the string ''aaabcaaabcaaabcaaabc...aaabc'' constructed from infinite concatenations of the sub-string ''aaabc''.
The VOM model of maximal order 2 can approximate the above string using ''only'' the following
In this example, Pr(''c''|''ab'') = Pr(''c''|''b'') = 1.0; therefore, the shorter context ''b'' is sufficient to determine the next character. Similarly, the VOM model of maximal order 3 can
To construct the [[Markov chain]] of order 1 for the next character in that string, one must estimate the following 9 conditional probability components: {Pr(''a''|''a''), Pr(''a''|''b''), Pr(''a''|''c''), Pr(''b''|''a''), Pr(''b''|''
In practical settings there is seldom sufficient data to accurately estimate the [[exponential growth|exponentially increasing]] number of conditional probability components as the order of the Markov chain increases.
|