Revision as of 21:54, 28 September 2024 edit 2601:447:cd80:e200:5960:6acf:38e4:a8cc (talk) →As a latent-variable model ← Previous edit		Revision as of 21:56, 28 September 2024 edit undo 2601:447:cd80:e200:5960:6acf:38e4:a8cc (talk) →Likelihood function Next edit →
Line 234: The likelihood function for this model is defined by: :<math>L = \prod_{i=1}^n P(Y_i=y_i) = \prod_{i=1}^n \left( \prod_{j=1}^K P(Y_i=j)^{\delta_{j,y_i}} \right) ,</math> where the index <math>i</math> denotes the observations 1 to n and the index <math>j</math> denotes the classes 1 to ''K''. <math>\delta_{j,y_i}=\begin{cases}1 \text{ for } j=y_i \\ 0 \text{ otherwise}\end{cases}</math> is the Kronecker delta. The negative log-likelihood function is therefore the well-known cross-entropy: :<math>-\log L = - \sum_{i=1}^n \sum_{j=1}^K \delta_{j,y_i} \log(P(Y_i=j)).</math> ==Application in natural language processing==

Multinomial logistic regression: Difference between revisions