Content deleted Content added
mNo edit summary |
Fix disambiguation link Tags: Visual edit Mobile edit Mobile web edit |
||
(4 intermediate revisions by 4 users not shown) | |||
Line 3:
A '''word ''n''-gram language model''' is a purely statistical model of language. It has been superseded by [[recurrent neural network]]–based models, which have been superseded by [[large language model]]s.<ref>{{Cite journal |url=https://dl.acm.org/doi/10.5555/944919.944966 |title=A neural probabilistic language model |first1=Yoshua |last1=Bengio |first2=Réjean |last2=Ducharme |first3=Pascal |last3=Vincent |first4=Christian |last4=Janvin |date=March 1, 2003 |journal=The Journal of Machine Learning Research |volume=3 |pages=1137–1155 |via=ACM Digital Library}}</ref> It is based on an assumption that the probability of the next word in a sequence depends only on a fixed size window of previous words. If only one previous word is considered, it is called a bigram model; if two words, a trigram model; if ''n'' − 1 words, an ''n''-gram model.<ref name=jm/> Special tokens are introduced to denote the start and end of a sentence <math>\langle s\rangle</math> and <math>\langle /s\rangle</math>.
To prevent a zero probability being assigned to unseen words, each word's probability is slightly
== Unigram model ==
Line 116:
=== Skip-gram language model ===
[[File:1-skip-2-gram.svg|thumb|1-skip-2-grams for the text "the rain in Spain falls mainly on the plain"]]
Skip-gram language model is an attempt at overcoming the data sparsity problem that the preceding model (i.e. word ''n''-gram language model) faced. Words represented in an embedding vector were not necessarily consecutive anymore, but could leave gaps that are ''skipped'' over (thus the name "skip-gram").<ref>{{cite web|url=http://homepages.inf.ed.ac.uk/ballison/pdf/lrec_skipgrams.pdf|title=A Closer Look at Skip-gram Modelling|author=David Guthrie|date=2006|display-authors=etal|access-date=27 April 2014|archive-url=https://web.archive.org/web/20170517144625/http://homepages.inf.ed.ac.uk/ballison/pdf/lrec_skipgrams.pdf|archive-date=17 May 2017|url-status=dead}}</ref>
|