Revision as of 14:36, 29 May 2012 edit Elmackev (talk \| contribs) 131 edits No edit summary ← Previous edit		Revision as of 20:16, 10 June 2012 edit undo CarrieVS (talk \| contribs) Extended confirmed users 2,309 edits →Special properties of the hinge loss: Repairing links to disambiguation pages - You can help! Next edit →
Line 22: The simplest and most intuitive loss function for categorization is the misclassification loss, or 0-1 loss, which is 0 if <math>f(x_i)=y_i</math> and 1 if <math>f(x_i) \neq y_i</math>, i.e the [[heaviside step function]] on <math>-y_if(x_i)</math>. However, this loss function is not [[convex function\|convex]], which makes the regularization problem very difficult to minimize computationally. Therefore, we look for convex substitutes for the 0-1 loss. The hinge loss, <math> V(y_i,f(x_i)) = (1-yf(x))_+</math> where <math>(s)_+ = max(s,0)</math>, provides such a [[convex relaxation]]. In fact, the hinge loss is the tightest convex [[upper bound]] to the 0-1 misclassification loss function<ref>{{cite journal\|last=Lee\|first=Yoonkyung\|coauthors=Grace Wahba\|title=Multicategory Support Vector Machines\|journal=Journal of the American Statistical Association\|year=2012\|volume=99\|issue=465\|pages=67-81\|doi=10.1198/016214504000000098\|url=http://www.tandfonline.com/doi/abs/10.1198/016214504000000098}}</ref>, and with infinite data returns the [[Bayes]] optimal solution:<ref>{{cite journal\|last=Lin\|first=Yi\|title=Support Vector Machines and the Bayes Rule in Classification\|journal=Data Mining and Knowledge Discovery\|year=2002\|month=July\|volume=6\|issue=3\|pages=259-275\|doi=10.1023/A:1015469627679\|url=http://cbio.ensmp.fr/~jvert/svn/bibli/local/Lin2002Support.pdf}}</ref> <ref>{{cite journal\|last=Rosasco\|first=Lorenzo\|coauthors=Ernesto De Vito, Andrea Caponnetto, Michele Piana and Alessandro Verri\|title=Are Loss Functions All the Same\|journal=Neural Computation\|year=2004\|month=May\|volume=16\|series=5\|pages=1063-1076\|doi=10.1162/089976604773135104\|url=http://www.mitpressjournals.org/doi/pdf/10.1162/089976604773135104}}</ref> <math>f_b(x) = \left\{\begin{matrix}1&p(1\|x)>p(-1\|x)\\-1&p(1\|x)<p(-1\|x)\end{matrix}\right.</math>

Regularization perspectives on support vector machines: Difference between revisions