Content deleted Content added
wikilink (instead of MLP) |
→Unigram model: rephrased |
||
Line 38:
{{see also|Bag-of-words model}}
A special case, where n=0, is called a unigram model. Probability of each word in a sequence is independent from probabilities of other word in the sequence. Each word's probability in the seqence is equal to the word's probability in an entire document.
A special case, where n=0, the model can be treated as the combination of several one-state [[Finite-state machine|finite automata]].<ref>Christopher D. Manning, Prabhakar Raghavan, Hinrich Schütze (2009). ''An Introduction to Information Retrieval''. pp. 237–240. Cambridge University Press.</ref> It assumes that the probabilities of tokens in a sequence are independent, e.g.:▼
<math display="block">P_\text{uni}(t_1t_2t_3)=P(t_1)P(t_2)P(t_3).</math>
▲
{| class="wikitable"
|-
!
|-
| a || 0.1
Line 61:
|}
Total mass of word probabilities distributed across the document's vocabulary, is 1.
<math display="block">\sum_{\text{term in doc}} P(\text{term}) = 1</math>▼
The probability generated for a specific query is calculated as
<math display="block">P(\text{query}) = \prod_{\text{
{| class="wikitable"
|-
!
|-
| a || 0.1 || 0.3
|