Content deleted Content added
Holydiver4 (talk | contribs) No edit summary |
Holydiver4 (talk | contribs) added derivation of Square loss |
||
Line 104:
</math>. This holds even for the nonconvex loss functions which means that gradient descent based algorithms such as [[Gradient boosting|Gradient Boosting]] can be used to construct the minimizer in practice with finite training samples.
== Square loss ==
While more commonly used in regression, the square loss function can be re-written as a function <math>\phi(yf(\vec{x}))</math> and utilized for classification.
:<math>
A benefit of the square loss function is that its structure lends itself to easy cross validation of regularization parameters. Specifically for [[Tikhonov regularization]], one can solve for the regularization parameter using leave-one-out [[cross-validation (statistics) |cross-validation]] in the same time as it would take to solve a single problem.<ref>{{Citation| last= Rifkin| first= Ryan M.| last2= Lippert| first2= Ross A.| title= Notes on Regularized Least Squares| publisher= MIT Computer Science and Artificial Intelligence Laboratory| date= 1 May 2007|url=https://dspace.mit.edu/bitstream/handle/1721.1/37318/MIT-CSAIL-TR-2007-025.pdf?sequence=1}}</ref>
The minimizer of <math>I[f]</math> for the square loss function is
:<math>f^*_\text{Square}= 2\eta-1=2p(1\mid x)-1</math>
== Hinge loss ==
Line 122:
The hinge loss provides a relatively tight, convex upper bound on the 0–1 [[indicator function]]. Specifically, the hinge loss equals the 0–1 [[indicator function]] when <math>\operatorname{sgn}(f(\vec{x})) = y</math> and <math>|yf(\vec{x})| \geq 1</math>. In addition, the empirical risk minimization of this loss is equivalent to the classical formulation for [[support vector machines]] (SVMs). Correctly classified points lying outside the margin boundaries of the support vectors are not penalized, whereas points within the margin boundaries or on the wrong side of the hyperplane are penalized in a linear fashion compared to their distance from the correct boundary.<ref name="Utah" />
While the hinge loss function is both convex and continuous, it is not smooth (
The minimizer of <math>I[f]</math> for the hinge loss function is
Line 128:
:<math>f^*_\text{Hinge}(\vec{x}) \;=\; \begin{cases} 1& \text{if }p(1\mid\vec{x}) > p(-1\mid\vec{x}) \\ -1 & \text{if }p(1\mid\vec{x}) < p(-1\mid\vec{x}) \end{cases}</math>
when <math>p(1\mid x) \ne 0.5</math>, which matches that of the 0–1 indicator function. This conclusion makes the hinge loss quite attractive, as bounds can be placed on the difference between expected risk and the sign of hinge loss function.<ref name="mit" /> The Hinge loss cannot be derived from (2) since <math>f^*_{\text{Hinge}}</math> is not invertible.
== Generalized Smooth Hinge loss ==
|