Loss functions for classification: Difference between revisions

Content deleted Content added
put in the logistic sigmoid to make the equivalence between "logistic loss" and "cross entropy" more explicit.
Line 101:
== Cross entropy loss (Log Loss) ==
{{main|Cross entropy}}
Using the alternative label convention <math>t=(1+y)/2</math> so that <math>t \in \{0,1\}</math>, the binary cross entropy loss is defined as
 
:<math>V(f(\vec{x}),t) = -t\ln(f\sigma(\vec{x}))-(1-t)\ln(1-f\sigma(\vec{x}))</math>
 
where we introduced the logistic sigmoid:
 
:<math>\sigma(\vec{x}) = \frac{1}{1+\exp(-f(\vec{x}))}</math>
 
It's easy to check that the [[logistic loss]] (above) and binary cross entropy are in fact the same (up to a multiplicative constant <math>1/\ln2</math>).
 
The cross entropy loss is closely related to the [[Kullback-Leibler divergence]] between the empirical distribution and the predicted distribution. This function is not naturally represented as a product of the true label and the predicted value, but is convex and can be minimized using [[stochastic gradient descent]] methods. The cross entropy loss is ubiquitous in modern [[deep learning|deep neural networks]].