Revision as of 23:16, 26 July 2019 edit Holydiver4 (talk \| contribs) 13 edits added derivation of Square loss Tag: Visual edit ← Previous edit		Revision as of 23:19, 26 July 2019 edit undo Holydiver4 (talk \| contribs) 13 edits reorder losses Tag: Visual edit Next edit →
Line 114: :<math>f^_\text{Square}= 2\eta-1=2p(1\mid x)-1</math> <br /> == Hinge loss ==▼ {{main\|Hinge loss}}▼ The hinge loss function is defined as▼ :<math>V(f(\vec{x}),y) = \max(0, 1-yf(\vec{x})) = \|1 - yf(\vec{x}) \|_{+}.</math>▼ The hinge loss provides a relatively tight, convex upper bound on the 0–1 [[indicator function]]. Specifically, the hinge loss equals the 0–1 [[indicator function]] when <math>\operatorname{sgn}(f(\vec{x})) = y</math> and <math>\|yf(\vec{x})\| \geq 1</math>. In addition, the empirical risk minimization of this loss is equivalent to the classical formulation for [[support vector machines]] (SVMs). Correctly classified points lying outside the margin boundaries of the support vectors are not penalized, whereas points within the margin boundaries or on the wrong side of the hyperplane are penalized in a linear fashion compared to their distance from the correct boundary.<ref name="Utah" />▼ While the hinge loss function is both convex and continuous, it is not smooth (is not differentiable) at <math>yf(\vec{x})=1</math>. Consequently, the hinge loss function cannot be used with [[gradient descent]] methods or [[stochastic gradient descent]] methods which rely on differentiability over the entire ___domain. However, the hinge loss does have a subgradient at <math>yf(\vec{x})=1</math>, which allows for the utilization of [[subgradient method \| subgradient descent methods]].<ref name="Utah" /> SVMs utilizing the hinge loss function can also be solved using [[quadratic programming]].▼ The minimizer of <math>I[f]</math> for the hinge loss function is▼ :<math>f^_\text{Hinge}(\vec{x}) \;=\; \begin{cases} 1& \text{if }p(1\mid\vec{x}) > p(-1\mid\vec{x}) \\ -1 & \text{if }p(1\mid\vec{x}) < p(-1\mid\vec{x}) \end{cases}</math>▼ when <math>p(1\mid x) \ne 0.5</math>, which matches that of the 0–1 indicator function. This conclusion makes the hinge loss quite attractive, as bounds can be placed on the difference between expected risk and the sign of hinge loss function.<ref name="mit" /> The Hinge loss cannot be derived from (2) since <math>f^_{\text{Hinge}}</math> is not invertible.▼ == Generalized Smooth Hinge loss ==▼ The generalized smooth hinge loss function with parameter <math>\alpha</math> is defined as▼ :<math>f^_\alpha(z) \;=\; \begin{cases} \frac{\alpha}{\alpha + 1}& \text{if }z< 0 \\ \frac{1}{\alpha + 1}z^{\alpha + 1} - z + \frac{\alpha}{\alpha + 1} & \text{if } 0<z<1 \\ 0 & \text{if } z \geq 1 \end{cases}.</math>▼ Where▼ :<math>z = yf(\vec{x})</math>▼ It is monotonically increasing and reaches 0 when :<math>z = 1</math>▼ == Logistic loss == Line 173 ⟶ 149: It penalizes incorrect predictions more than Hinge loss and has a larger gradient. ▲== Hinge loss == ▲{{main\|Hinge loss}} ▲The hinge loss function is defined as ▲:<math>V(f(\vec{x}),y) = \max(0, 1-yf(\vec{x})) = \|1 - yf(\vec{x}) \|_{+}.</math> ▲The hinge loss provides a relatively tight, convex upper bound on the 0–1 [[indicator function]]. Specifically, the hinge loss equals the 0–1 [[indicator function]] when <math>\operatorname{sgn}(f(\vec{x})) = y</math> and <math>\|yf(\vec{x})\| \geq 1</math>. In addition, the empirical risk minimization of this loss is equivalent to the classical formulation for [[support vector machines]] (SVMs). Correctly classified points lying outside the margin boundaries of the support vectors are not penalized, whereas points within the margin boundaries or on the wrong side of the hyperplane are penalized in a linear fashion compared to their distance from the correct boundary.<ref name="Utah" /> ▲While the hinge loss function is both convex and continuous, it is not smooth (is not differentiable) at <math>yf(\vec{x})=1</math>. Consequently, the hinge loss function cannot be used with [[gradient descent]] methods or [[stochastic gradient descent]] methods which rely on differentiability over the entire ___domain. However, the hinge loss does have a subgradient at <math>yf(\vec{x})=1</math>, which allows for the utilization of [[subgradient method \| subgradient descent methods]].<ref name="Utah" /> SVMs utilizing the hinge loss function can also be solved using [[quadratic programming]]. ▲The minimizer of <math>I[f]</math> for the hinge loss function is ▲:<math>f^_\text{Hinge}(\vec{x}) \;=\; \begin{cases} 1& \text{if }p(1\mid\vec{x}) > p(-1\mid\vec{x}) \\ -1 & \text{if }p(1\mid\vec{x}) < p(-1\mid\vec{x}) \end{cases}</math> ▲when <math>p(1\mid x) \ne 0.5</math>, which matches that of the 0–1 indicator function. This conclusion makes the hinge loss quite attractive, as bounds can be placed on the difference between expected risk and the sign of hinge loss function.<ref name="mit" /> The Hinge loss cannot be derived from (2) since <math>f^_{\text{Hinge}}</math> is not invertible. ▲== Generalized Smooth Hinge loss == ▲The generalized smooth hinge loss function with parameter <math>\alpha</math> is defined as ▲:<math>f^*_\alpha(z) \;=\; \begin{cases} \frac{\alpha}{\alpha + 1}& \text{if }z< 0 \\ \frac{1}{\alpha + 1}z^{\alpha + 1} - z + \frac{\alpha}{\alpha + 1} & \text{if } 0<z<1 \\ 0 & \text{if } z \geq 1 \end{cases}.</math> ▲Where ▲:<math>z = yf(\vec{x})</math> ▲It is monotonically increasing and reaches 0 when :<math>z = 1</math> == References ==

Loss functions for classification: Difference between revisions