Revision as of 23:09, 8 May 2025 edit Cosmia Nebula (talk \| contribs) Extended confirmed users 11,304 edits →Skip-gram language model: caption Tag: Visual edit ← Previous edit		Revision as of 06:45, 26 May 2025 edit undo 188.192.44.87 (talk) The model probability should be "higher" not "lower" than the word count seen in the corpus. This is obtained, as correctly described already, by adding a count of 1 to unseen n-grams. Next edit →
Line 3: A '''word ''n''-gram language model''' is a purely statistical model of language. It has been superseded by [[recurrent neural network]]–based models, which have been superseded by [[large language model]]s.<ref>{{Cite journal \|url=https://dl.acm.org/doi/10.5555/944919.944966 \|title=A neural probabilistic language model \|first1=Yoshua \|last1=Bengio \|first2=Réjean \|last2=Ducharme \|first3=Pascal \|last3=Vincent \|first4=Christian \|last4=Janvin \|date=March 1, 2003 \|journal=The Journal of Machine Learning Research \|volume=3 \|pages=1137–1155 \|via=ACM Digital Library}}</ref> It is based on an assumption that the probability of the next word in a sequence depends only on a fixed size window of previous words. If only one previous word is considered, it is called a bigram model; if two words, a trigram model; if ''n'' − 1 words, an ''n''-gram model.<ref name=jm/> Special tokens are introduced to denote the start and end of a sentence <math>\langle s\rangle</math> and <math>\langle /s\rangle</math>. To prevent a zero probability being assigned to unseen words, each word's probability is slightly ~~lower~~higher than its frequency count in a corpus. To calculate it, various methods were used, from simple "add-one" smoothing (assign a count of 1 to unseen ''n''-grams, as an [[uninformative prior]]) to more sophisticated models, such as [[Good–Turing discounting]] or [[Katz's back-off model\|back-off models]]. == Unigram model ==

Word n-gram language model: Difference between revisions