Content deleted Content added
No edit summary |
Linking. |
||
Line 19:
<math display="block"> P(w_m \mid w_1,\ldots,w_{m-1}) = \frac{1}{Z(w_1,\ldots,w_{m-1})} \exp (a^T f(w_1,\ldots,w_m))</math>
where <math>Z(w_1,\ldots,w_{m-1})</math> is the [[Partition function (mathematics)|partition function]], <math>a</math> is the parameter vector, and <math>f(w_1,\ldots,w_m)</math> is the feature function. In the simplest case, the feature function is just an indicator of the presence of a certain ''n''-gram. It is helpful to use a prior on <math>a</math> or some form of [[Regularization (mathematics)|regularization]].
The log-bilinear model is another example of an exponential language model.
|