Content deleted Content added
Line 234:
The likelihood function for this model is defined by:
:<math>L = \prod_{i=1}^n P(Y_i=y_i) = \prod_{i=1}^n \left( \prod_{j=1}^K P(Y_i=j)^{\delta_{j,y_i}} \right) ,</math> where the index <math>i</math> denotes the observations 1 to n and the index <math>j</math> denotes the classes 1 to ''K''. <math>\delta_{j,y_i}=\begin{cases}1 \text{ for } j=y_i \\ 0 \text{ otherwise}\end{cases}</math> is the Kronecker delta.
The negative log-likelihood function is therefore the well-known cross-entropy:
:<math>-\log L = - \sum_{i=1}^n \sum_{j=1}^K \delta_{j,y_i} \log(P(Y_i=j)).</math> ==Application in natural language processing==
|