Revision as of 17:17, 10 March 2023 edit Colin M (talk \| contribs) Autopatrolled, Administrators 12,442 edits rejig section order ← Previous edit		Revision as of 17:19, 10 March 2023 edit undo Colin M (talk \| contribs) Autopatrolled, Administrators 12,442 edits tweak intro of unigram section and retitle Next edit →
Line 41: Additionally, without an end-of-sentence marker, the probability of an ungrammatical sequence ''*I saw the'' would always be higher than that of the longer sentence ''I saw the red house.'' == Unigram model == {{see also\|Bag-of-words model}} A special case of an n-gram model is the unigram model, where n=0. A unigram model can be treated as the combination of several one-state [[Finite-state machine\|finite automata]].<ref>Christopher D. Manning, Prabhakar Raghavan, Hinrich Schütze (2009). ''An Introduction to Information Retrieval''. pp. 237–240. Cambridge University Press.</ref> It assumes that the probabilities of tokens in a sequence are independent, e.g.: <math display="block">P_\text{uni}(t_1t_2t_3)=P(t_1)P(t_2)P(t_3).</math>

Word n-gram language model: Difference between revisions