Revision as of 13:28, 10 August 2023 edit DancingPhilosopher (talk \| contribs) Extended confirmed users 5,622 edits →Skip-gram language model ← Previous edit		Revision as of 13:56, 10 August 2023 edit undo DancingPhilosopher (talk \| contribs) Extended confirmed users 5,622 edits lede; rephrased Next edit →
Line 1: {{DISPLAYTITLE:word ''n''-gram language model}} A '''word n-gram model''' of natural language is a purely statistical (as opposed to [[Recurrent neural network\|recurrent]] neural-network-based, which replaced it in the 2000s, and [[large language model]]-based, which overperformed it in early 2020s) [[language model]].<ref>{{Cite journal\|url=https://dl.acm.org/doi/10.5555/944919.944966\|title=A neural probabilistic language model\|first1=Yoshua\|last1=Bengio\|first2=Réjean\|last2=Ducharme\|first3=Pascal\|last3=Vincent\|first4=Christian\|last4=Janvin\|date=March 1, 2003\|journal=The Journal of Machine Learning Research\|volume=3\|pages=1137–1155\|via=ACM Digital Library}}</ref> It is based on an assumption that the probability of the next word in a sequence depends only on a fixed size window of previous words. If only one previous word was considered, it was called a bigram model; if two words, a trigram model; if ''n''-1 words, an ''n''-gram model.<ref name=jm/> Special tokens were introduced to denote the start and end of a sentence <math>\langle s\rangle</math> and <math>\langle /s\rangle</math>. To prevent a zero probability being assigned to unseen words, each word's probability is slightly lower than its frequency count in a corpus. To calculate it, various methods were used, from simple "add-one" smoothing (assign a count of 1 to unseen ''n''-grams, as an [[uninformative prior]]) to more sophisticated models, such as [[Good–Turing discounting]] or [[Katz's back-off model\|back-off model]]s.

Word n-gram language model: Difference between revisions