Content deleted Content added
→Unigram model: rephrased |
lede rephased; sections reorder |
||
Line 1:
{{DISPLAYTITLE:word ''n''-gram language model}}
A '''word n-gram model'''
<math display="block">P(\text{I, saw, the, red, house}) \approx P(\text{I}\mid\langle s\rangle) P(\text{saw}\mid \text{I}) P(\text{the}\mid\text{saw}) P(\text{red}\mid\text{the}) P(\text{house}\mid\text{red}) P(\langle /s\rangle\mid \text{house})</math>▼
The approximation used in the model is the probability <math>P(w_1,\ldots,w_m)</math> of observing the sentence <math>w_1,\ldots,w_m</math>▼
<math display="block">P(w_1,\ldots,w_m) = \prod^m_{i=1} P(w_i\mid w_1,\ldots,w_{i-1})\approx \prod^m_{i=2} P(w_i\mid w_{i-(n-1)},\ldots,w_{i-1})</math>▼
It is assumed that the probability of observing the ''i''<sup>th</sup> word ''w<sub>i</sub>'' in the context history of the preceding ''i'' − 1 words can be approximated by the probability of observing it in the shortened context history of the preceding ''n'' − 1 words (''n''<sup>th</sup>-order [[Markov property]]). To clarify, for ''n'' = 3 and ''i'' = 2 we have <math>P(w_i\mid w_{i-(n-1)},\ldots,w_{i-1})=P(w_2\mid w_1)</math>.▼
The conditional probability can be calculated from ''n''-gram model frequency counts:▼
<math display="block">P(w_i\mid w_{i-(n-1)},\ldots,w_{i-1}) = \frac{\mathrm{count}(w_{i-(n-1)},\ldots,w_{i-1},w_i)}{\mathrm{count}(w_{i-(n-1)},\ldots,w_{i-1})}</math>▼
In a bigram word (''n'' = 2) language model, the probability of the sentence ''I saw the red house'' is approximated as▼
<math display="block">P(\text{I, saw, the, red, house}) \approx P(\text{I}\mid\langle s\rangle) P(\text{saw}\mid \text{I}) P(\text{the}\mid\text{saw}) P(\text{red}\mid\text{the}) P(\text{house}\mid\text{red}) P(\langle /s\rangle\mid \text{house})</math>▼
whereas in a trigram (''n'' = 3) language model, the approximation is▼
Note that the context of the first ''n'' – 1 ''n''-grams is filled with start-of-sentence markers, typically denoted <nowiki><s></nowiki>.▼
Additionally, without an end-of-sentence marker, the probability of an ungrammatical sequence ''*I saw the'' would always be higher than that of the longer sentence ''I saw the red house.''▼
== Unigram model ==
Line 87 ⟶ 56:
| ... || ... || ...
|}
== Bigram model ==
▲In a bigram word (''n'' = 2) language model, the probability of the sentence ''I saw the red house'' is approximated as
▲<math display="block">P(\text{I, saw, the, red, house}) \approx P(\text{I}\mid\langle s\rangle) P(\text{saw}\mid \text{I}) P(\text{the}\mid\text{saw}) P(\text{red}\mid\text{the}) P(\text{house}\mid\text{red}) P(\langle /s\rangle\mid \text{house})</math>
== Trigram model ==
▲<math display="block">P(\text{I, saw, the, red, house}) \approx P(\text{I}\mid \langle s\rangle,\langle s\rangle) P(\text{saw}\mid\langle s\
▲Note that the context of the first ''n'' – 1 ''n''-grams is filled with start-of-sentence markers, typically denoted <nowiki><s></nowiki>.
▲Additionally, without an end-of-sentence marker, the probability of an ungrammatical sequence ''*I saw the'' would always be higher than that of the longer sentence ''I saw the red house.''
== Approximation method ==
▲The approximation
▲<math display="block">P(w_1,\ldots,w_m) = \prod^m_{i=1} P(w_i\mid w_1,\ldots,w_{i-1})\approx \prod^m_{i=2} P(w_i\mid w_{i-(n-1)},\ldots,w_{i-1})</math>
▲It is assumed that the probability of observing the ''i''<sup>th</sup> word ''w<sub>i</sub>'' (in the context
▲The conditional probability can be calculated from ''n''-gram model frequency counts:
▲<math display="block">P(w_i\mid w_{i-(n-1)},\ldots,w_{i-1}) = \frac{\mathrm{count}(w_{i-(n-1)},\ldots,w_{i-1},w_i)}{\mathrm{count}(w_{i-(n-1)},\ldots,w_{i-1})}</math>
==References==
|