Loss functions for classification: Difference between revisions

Content deleted Content added
updated refrences
added logistic generation
Line 113:
The minimizer of <math>I[f]</math> for the square loss function is
:<math>f^*_\text{Square}= 2\eta-1=2p(1\mid x)-1</math>
 
<br />
 
== Logistic loss ==
The logistic loss function iscan definedbe generated using (2) and Table-I as follows
 
:<math>\phi(v)=C[f^{-1}(v)]+(1-f^{-1}(v))C'[f^{-1}(v)] =\frac{1}{\log(2)}[\frac{-e^v}{1+e^v}\log(\frac{e^v}{1+e^v})-(1-\frac{e^v}{1+e^v})\log(1-\frac{e^v}{1+e^v}))]+(1-\frac{e^v}{1+e^v})[\frac{-1}{\log(2)}(\log(\frac{\frac{e^v}{1+e^v}}{1-\frac{e^v}{1+e^v}}))]=\frac{1}{\log(2)}\log(1+e^{-v})</math>
:<math>V(f(\vec{x}),y) = \frac{1}{\log 2}\log(1+e^{-yf(\vec{x})})</math>
 
This function displays a similar convergence rate to the hinge loss function, and since it is continuous, [[gradient descent]] methods can be utilized. However, the logistic loss function does not assign zero penalty to any points. Instead, functions that correctly classify points with high confidence (i.e., with high values of <math>|f(\vec{x})|</math>) are penalized less. This structure leads the logistic loss function to be sensitive to outliers in the data.