Regularization perspectives on support vector machines: Difference between revisions

Content deleted Content added
Elmackev (talk | contribs)
No edit summary
Special properties of the hinge loss: Repairing links to disambiguation pages - You can help!
Line 22:
 
 
The simplest and most intuitive loss function for categorization is the misclassification loss, or 0-1 loss, which is 0 if <math>f(x_i)=y_i</math> and 1 if <math>f(x_i) \neq y_i</math>, i.e the [[heaviside step function]] on <math>-y_if(x_i)</math>. However, this loss function is not [[convex function|convex]], which makes the regularization problem very difficult to minimize computationally. Therefore, we look for convex substitutes for the 0-1 loss. The hinge loss, <math> V(y_i,f(x_i)) = (1-yf(x))_+</math> where <math>(s)_+ = max(s,0)</math>, provides such a [[convex relaxation]]. In fact, the hinge loss is the tightest convex [[upper bound]] to the 0-1 misclassification loss function<ref>{{cite journal|last=Lee|first=Yoonkyung|coauthors=Grace Wahba|title=Multicategory Support Vector Machines|journal=Journal of the American Statistical Association|year=2012|volume=99|issue=465|pages=67-81|doi=10.1198/016214504000000098|url=http://www.tandfonline.com/doi/abs/10.1198/016214504000000098}}</ref>, and with infinite data returns the [[Bayes]] optimal solution:<ref>{{cite journal|last=Lin|first=Yi|title=Support Vector Machines and the Bayes Rule in Classification|journal=Data Mining and Knowledge Discovery|year=2002|month=July|volume=6|issue=3|pages=259-275|doi=10.1023/A:1015469627679|url=http://cbio.ensmp.fr/~jvert/svn/bibli/local/Lin2002Support.pdf}}</ref> <ref>{{cite journal|last=Rosasco|first=Lorenzo|coauthors=Ernesto De Vito, Andrea Caponnetto, Michele Piana and Alessandro Verri|title=Are Loss Functions All the Same|journal=Neural Computation|year=2004|month=May|volume=16|series=5|pages=1063-1076|doi=10.1162/089976604773135104|url=http://www.mitpressjournals.org/doi/pdf/10.1162/089976604773135104}}</ref>
 
<math>f_b(x) = \left\{\begin{matrix}1&p(1|x)>p(-1|x)\\-1&p(1|x)<p(-1|x)\end{matrix}\right.</math>