Revision as of 06:51, 27 May 2014 edit Carriearchdale (talk \| contribs) 12,530 edits m →Special properties of the hinge loss: clean up, typo(s) fixed: i.e → i.e. using AWB ← Previous edit		Revision as of 01:34, 30 June 2014 edit undo 18.111.93.205 (talk) →Special properties of the hinge loss Next edit →
Line 18: [[File:Hinge and Misclassification Loss.png\|Hinge and misclassification loss functions]] The simplest and most intuitive loss function for categorization is the misclassification loss, or 0-1 loss, which is 0 if <math>f(x_i)=y_i</math> and 1 if <math>f(x_i) \neq y_i</math>, i.e. the [[~~heaviside~~Heaviside step function]] on <math>-y_if(x_i)</math>. However, this loss function is not [[convex function\|convex]], which makes the regularization problem very difficult to minimize computationally. Therefore, we look for convex substitutes for the 0-1 loss. The hinge loss, <math> V(y_i,f(x_i)) = (1-yf(x))_+</math> where <math>(s)_+ = max(s,0)</math>, provides such a [[convex relaxation]]. In fact, the hinge loss is the tightest convex [[upper bound]] to the 0-1 misclassification loss function,<ref name="Lee 2012 67–81"/> and with infinite data returns the [[Bayes' theorem\|Bayes]] optimal solution:<ref name="Rosasco 2004 1063–1076"/><ref>{{cite journal\|last=Lin\|first=Yi\|title=Support Vector Machines and the Bayes Rule in Classification\|journal=Data Mining and Knowledge Discovery\|date=July 2002\|volume=6\|issue=3\|pages=259–275\|doi=10.1023/A:1015469627679\|url=http://cbio.ensmp.fr/~jvert/svn/bibli/local/Lin2002Support.pdf}}</ref> <math>f_b(x) = \left\{\begin{matrix}1&p(1\|x)>p(-1\|x)\\-1&p(1\|x)<p(-1\|x)\end{matrix}\right.</math>

Regularization perspectives on support vector machines: Difference between revisions