Content deleted Content added
Aimefournier (talk | contribs) m →As a log-linear model: Use fewer letters (no c, i, n), to improve consistency with other sections. |
Greatgalaxy (talk | contribs) Link suggestions feature: 3 links added. |
||
Line 190:
where <math>\varepsilon_k \sim \operatorname{EV}_1(0,1),</math> i.e. a standard type-1 [[extreme value distribution]].
This latent variable can be thought of as the [[utility]] associated with data point ''i'' choosing outcome ''k'', where there is some randomness in the actual amount of utility obtained, which accounts for other unmodeled factors that go into the choice. The value of the actual variable <math>Y_i</math> is then determined in a non-random fashion from these latent variables (i.e. the randomness has been moved from the observed outcomes into the latent variables), where outcome ''k'' is chosen [[if and only if]] the associated utility (the value of <math>Y_{i,k}^{\ast}</math>) is greater than the utilities of all the other choices, i.e. if the utility associated with outcome ''k'' is the maximum of all the utilities. Since the latent variables are [[continuous variable|continuous]], the probability of two having exactly the same value is 0, so we ignore the scenario. That is:
: <math>
Line 231:
The observed values <math>y_i \in \{1,\dots,K\}</math> for <math>i=1,\dots,n</math> of the explained variables are considered as realizations of stochastically independent, [[Categorical distribution|categorically distributed]] random variables <math>Y_1,\dots, Y_n</math>.
The [[likelihood function]] for this model is defined by
:<math>L = \prod_{i=1}^n P(Y_i=y_i) = \prod_{i=1}^n \prod_{j=1}^K P(Y_i=j)^{\delta_{j,y_i}},</math>
where the index <math>i</math> denotes the observations 1 to ''n'' and the index <math>j</math> denotes the classes 1 to ''K''. <math>\delta_{j,y_i}=\begin{cases}1, \text{ for } j=y_i \\ 0, \text{ otherwise}\end{cases}</math> is the [[Kronecker delta]].
The negative log-likelihood function is therefore the well-known cross-entropy:
|