Word n-gram language model: Difference between revisions

Content deleted Content added
Copying some content from Language model. See history of that page for attribution. [TODO: check unreliable source that edit filter flagged]
Tags: citing a blog or free web host nowiki added
remove unreliable blogspot source. The claim it's being used to cite is pretty anodyne anyways.
Line 75:
<math display="block">P(w_i\mid w_{i-(n-1)},\ldots,w_{i-1}) = \frac{\mathrm{count}(w_{i-(n-1)},\ldots,w_{i-1},w_i)}{\mathrm{count}(w_{i-(n-1)},\ldots,w_{i-1})}</math>
 
The terms '''bigram''' and '''trigram''' language models denote ''n''-gram models with ''n''&nbsp;=&nbsp;2 and ''n''&nbsp;=&nbsp;3, respectively.<ref>Craig Trim, [http://trimc-nlp.blogspot.com/2013/04/language-modeling.html ''What is Language Modeling?''], April 26th, 2013.</ref>
 
Typically, the ''n''-gram model probabilities are not derived directly from frequency counts, because models derived this way have severe problems when confronted with any ''n''-grams that have not been explicitly seen before. Instead, some form of smoothing is necessary, assigning some of the total probability mass to unseen words or ''n''-grams. Various methods are used, from simple "add-one" smoothing (assign a count of 1 to unseen ''n''-grams, as an [[uninformative prior]]) to more sophisticated models, such as [[Good–Turing discounting]] or [[Katz's back-off model|back-off model]]s.