Revision as of 10:57, 7 August 2023 edit DancingPhilosopher (talk \| contribs) Extended confirmed users 5,622 edits when it was overperformed and superseded ← Previous edit		Revision as of 11:01, 7 August 2023 edit undo DancingPhilosopher (talk \| contribs) Extended confirmed users 5,622 edits rephrased Next edit →
Line 1: {{DISPLAYTITLE:word ''n''-gram language model}} A '''word n-gram model''' was a [[language model]] that generated probabilities of a series of words, based on an (over-simplified) assumption that the probability of the next word in a sequence depends only on a fixed size window of previous words. It is no longerwas used ~~in [[natural language processing]] because it has been superseded by [[neural language model\|deep learning]]-based [[large language model]]s. In~~until 2003, when it was ~~overperformed and~~ superseded by a [[multi-layer perceptron]] (with a single hidden layer and context length of several words trained on up to 14 million of words with a CPU cluster ~~in [[language model]]ling~~) by [[Yoshua Bengio]] with co-authors.<ref>{{Cite journal\|url=https://dl.acm.org/doi/10.5555/944919.944966\|title=A neural probabilistic language model\|first1=Yoshua\|last1=Bengio\|first2=Réjean\|last2=Ducharme\|first3=Pascal\|last3=Vincent\|first4=Christian\|last4=Janvin\|date=March 1, 2003\|journal=The Journal of Machine Learning Research\|volume=3\|pages=1137–1155\|via=ACM Digital Library}}</ref> It is now superseded by [[neural language model\|deep learning]]-based [[large language model]]s. It was based on an assumption that the probability of the next word in a sequence depends only on a fixed size window of previous words. The probabilities were not equal to frequency counts, because otherwise it could not assign a portion of the total probability mass to words not contained in the training dataset. Various methods were used, from simple "add-one" smoothing (assign a count of 1 to unseen ''n''-grams, as an [[uninformative prior]]) to more sophisticated models, such as [[Good–Turing discounting]] or [[Katz's back-off model\|back-off model]]s.

Word n-gram language model: Difference between revisions