Revision as of 17:27, 25 July 2025 edit Comfr (talk \| contribs) Extended confirmed users 11,135 edits m +link Tag: Disambiguation links added ← Previous edit		Latest revision as of 01:30, 22 August 2025 edit undo Dpen2000 (talk \| contribs) 355 edits Fix disambiguation link Tags: Visual edit Mobile edit Mobile web edit
Line 3: A '''word ''n''-gram language model''' is a purely statistical model of language. It has been superseded by [[recurrent neural network]]–based models, which have been superseded by [[large language model]]s.<ref>{{Cite journal \|url=https://dl.acm.org/doi/10.5555/944919.944966 \|title=A neural probabilistic language model \|first1=Yoshua \|last1=Bengio \|first2=Réjean \|last2=Ducharme \|first3=Pascal \|last3=Vincent \|first4=Christian \|last4=Janvin \|date=March 1, 2003 \|journal=The Journal of Machine Learning Research \|volume=3 \|pages=1137–1155 \|via=ACM Digital Library}}</ref> It is based on an assumption that the probability of the next word in a sequence depends only on a fixed size window of previous words. If only one previous word is considered, it is called a bigram model; if two words, a trigram model; if ''n'' − 1 words, an ''n''-gram model.<ref name=jm/> Special tokens are introduced to denote the start and end of a sentence <math>\langle s\rangle</math> and <math>\langle /s\rangle</math>. To prevent a zero probability being assigned to unseen words, each word's probability is slightly higher than its frequency count in a [[Text corpus\|corpus]]. To calculate it, various methods were used, from simple "add-one" smoothing (assign a count of 1 to unseen ''n''-grams, as an [[uninformative prior]]) to more sophisticated models, such as [[Good–Turing discounting]] or [[Katz's back-off model\|back-off models]]. == Unigram model ==

Word n-gram language model: Difference between revisions