Revision as of 23:19, 26 July 2019 edit Holydiver4 (talk \| contribs) 13 edits reorder losses Tag: Visual edit ← Previous edit		Revision as of 15:01, 29 July 2019 edit undo 64.58.145.95 (talk) No edit summary Tag: Visual edit Next edit →
Line 78: \|- \|Logistic \|<math>\frac{1}{\lnlog(2)}\lnlog(1+e^{-v})</math> \|<math>\frac{1}{\lnlog(2)}[-\eta\lnlog(\eta)-(1-\eta)\lnlog(1-\eta)]</math> \|<math>\frac{e^v}{1+e^v}</math> \|<math>\log(\frac{\eta}{1-\eta})</math> Line 119: The logistic loss function is defined as :<math>V(f(\vec{x}),y) = \frac{1}{\lnlog 2}\lnlog(1+e^{-yf(\vec{x})})</math> This function displays a similar convergence rate to the hinge loss function, and since it is continuous, [[gradient descent]] methods can be utilized. However, the logistic loss function does not assign zero penalty to any points. Instead, functions that correctly classify points with high confidence (i.e., with high values of <math>\|f(\vec{x})\|</math>) are penalized less. This structure leads the logistic loss function to be sensitive to outliers in the data. Line 125: The minimizer of <math>I[f]</math> for the logistic loss function is :<math>f^*_\text{Logistic}= \lnlog\left(\frac{p(1\mid x)}{1-p(1\mid x)}\right).</math> This function is undefined when <math>p(1\mid x)=1</math> or <math>p(1\mid x)=0</math> (tending toward ∞ and −∞ respectively), but predicts a smooth curve which grows when <math>p(1\mid x)</math> increases and equals 0 when <math>p(1\mid x)= 0.5</math>.<ref name="mitlec" /> Line 133: Using the alternative label convention <math>t=(1+y)/2</math> so that <math>t \in \{0,1\}</math>, the binary cross entropy loss is defined as :<math>V(f(\vec{x}),t) = -t\lnlog(\sigma(\vec{x}))-(1-t)\lnlog(1-\sigma(\vec{x}))</math> where we introduced the logistic sigmoid: Line 139: :<math>\sigma(\vec{x}) = \frac{1}{1+e^{-f(\vec{x})}}</math> It's easy to check that the [[logistic loss]] (above) and binary cross entropy are in fact the same (up to a multiplicative constant <math>1/\~~ln2~~log2</math>). The cross entropy loss is closely related to the [[Kullback-Leibler divergence]] between the empirical distribution and the predicted distribution. This function is not naturally represented as a product of the true label and the predicted value, but is convex and can be minimized using [[stochastic gradient descent]] methods. The cross entropy loss is ubiquitous in modern [[deep learning\|deep neural networks]].

Loss functions for classification: Difference between revisions