Revision as of 17:43, 31 July 2019 edit 64.58.145.95 (talk) logistic loss Tag: Visual edit ← Previous edit		Revision as of 17:49, 31 July 2019 edit undo 64.58.145.95 (talk) cross entropy loss Tag: Visual edit Next edit →
Line 128: This function is undefined when <math>p(1\mid x)=1</math> or <math>p(1\mid x)=0</math> (tending toward ∞ and −∞ respectively), but predicts a smooth curve which grows when <math>p(1\mid x)</math> increases and equals 0 when <math>p(1\mid x)= 0.5</math>.<ref name="mitlec" /> It's easy to check that the [[logistic loss]] (above) and binary [[cross entropy]] loss (Log loss) are in fact the same (up to a multiplicative constant <math>\frac{1}{\log(2)}</math>).The cross entropy loss is closely related to the [[Kullback-Leibler divergence]] between the empirical distribution and the predicted distribution. The cross entropy loss is ubiquitous in modern [[deep learning\|deep neural networks]]. ~~== Cross entropy loss (Log Loss) ==~~ ~~{{main\|Cross entropy}}~~ ~~Using the alternative label convention <math>t=(1+y)/2</math> so that <math>t \in \{0,1\}</math>, the binary cross entropy loss is defined as~~ ~~:<math>V(f(\vec{x}),t) = -t\log(\sigma(\vec{x}))-(1-t)\log(1-\sigma(\vec{x}))</math>~~ ~~where we introduced the logistic sigmoid:~~ ~~:<math>\sigma(\vec{x}) = \frac{1}{1+e^{-f(\vec{x})}}</math>~~ ~~It's easy to check that the [[logistic loss]] (above) and binary cross entropy are in fact the same (up to a multiplicative constant <math>1/\log2</math>).~~ The cross entropy loss is closely related to the [[Kullback-Leibler divergence]] between the empirical distribution and the predicted distribution. This function is not naturally represented as a product of the true label and the predicted value, but is convex and can be minimized using [[stochastic gradient descent]] methods. The cross entropy loss is ubiquitous in modern [[deep learning\|deep neural networks]]. == Exponential loss ==

Loss functions for classification: Difference between revisions