Word n-gram language model: Difference between revisions

Content deleted Content added
rejig section order
tweak intro of unigram section and retitle
Line 41:
Additionally, without an end-of-sentence marker, the probability of an ungrammatical sequence ''*I saw the'' would always be higher than that of the longer sentence ''I saw the red house.''
 
== Unigram model ==
{{see also|Bag-of-words model}}
 
A special case of an n-gram model is the unigram model, where n=0. A unigram model can be treated as the combination of several one-state [[Finite-state machine|finite automata]].<ref>Christopher D. Manning, Prabhakar Raghavan, Hinrich Schütze (2009). ''An Introduction to Information Retrieval''. pp. 237–240. Cambridge University Press.</ref> It assumes that the probabilities of tokens in a sequence are independent, e.g.:
 
<math display="block">P_\text{uni}(t_1t_2t_3)=P(t_1)P(t_2)P(t_3).</math>