Word n-gram language model: Difference between revisions

Content deleted Content added
The model probability should be "higher" not "lower" than the word count seen in the corpus. This is obtained, as correctly described already, by adding a count of 1 to unseen n-grams.
Line 3:
A '''word ''n''-gram language model''' is a purely statistical model of language. It has been superseded by [[recurrent neural network]]–based models, which have been superseded by [[large language model]]s.<ref>{{Cite journal |url=https://dl.acm.org/doi/10.5555/944919.944966 |title=A neural probabilistic language model |first1=Yoshua |last1=Bengio |first2=Réjean |last2=Ducharme |first3=Pascal |last3=Vincent |first4=Christian |last4=Janvin |date=March 1, 2003 |journal=The Journal of Machine Learning Research |volume=3 |pages=1137–1155 |via=ACM Digital Library}}</ref> It is based on an assumption that the probability of the next word in a sequence depends only on a fixed size window of previous words. If only one previous word is considered, it is called a bigram model; if two words, a trigram model; if ''n''&nbsp;−&nbsp;1 words, an ''n''-gram model.<ref name=jm/> Special tokens are introduced to denote the start and end of a sentence <math>\langle s\rangle</math> and <math>\langle /s\rangle</math>.
 
To prevent a zero probability being assigned to unseen words, each word's probability is slightly lowerhigher than its frequency count in a corpus. To calculate it, various methods were used, from simple "add-one" smoothing (assign a count of 1 to unseen ''n''-grams, as an [[uninformative prior]]) to more sophisticated models, such as [[Good–Turing discounting]] or [[Katz's back-off model|back-off models]].
 
== Unigram model ==