Content deleted Content added
m Journal cites, added 1 PMID using AWB (9944) |
m →Special properties of the hinge loss: clean up, typo(s) fixed: i.e → i.e. using AWB |
||
Line 18:
[[File:Hinge and Misclassification Loss.png|Hinge and misclassification loss functions]]
The simplest and most intuitive loss function for categorization is the misclassification loss, or 0-1 loss, which is 0 if <math>f(x_i)=y_i</math> and 1 if <math>f(x_i) \neq y_i</math>, i.e. the [[heaviside step function]] on <math>-y_if(x_i)</math>. However, this loss function is not [[convex function|convex]], which makes the regularization problem very difficult to minimize computationally. Therefore, we look for convex substitutes for the 0-1 loss. The hinge loss, <math> V(y_i,f(x_i)) = (1-yf(x))_+</math> where <math>(s)_+ = max(s,0)</math>, provides such a [[convex relaxation]]. In fact, the hinge loss is the tightest convex [[upper bound]] to the 0-1 misclassification loss function,<ref name="Lee 2012 67–81"/> and with infinite data returns the [[Bayes' theorem|Bayes]] optimal solution:<ref name="Rosasco 2004 1063–1076"/><ref>{{cite journal|last=Lin|first=Yi|title=Support Vector Machines and the Bayes Rule in Classification|journal=Data Mining and Knowledge Discovery|date=July 2002|volume=6|issue=3|pages=259–275|doi=10.1023/A:1015469627679|url=http://cbio.ensmp.fr/~jvert/svn/bibli/local/Lin2002Support.pdf}}</ref>
<math>f_b(x) = \left\{\begin{matrix}1&p(1|x)>p(-1|x)\\-1&p(1|x)<p(-1|x)\end{matrix}\right.</math>
|