[[File:Hinge and Misclassification Loss.png|Hinge and misclassification loss functions]]
The simplest and most intuitive loss function for categorization is the misclassification loss, or 0-1 loss, which is 0 if <math>f(x_i)=y_i</math> and 1 if <math>f(x_i) \neq y_i</math>, i.e. the [[heavisideHeaviside step function]] on <math>-y_if(x_i)</math>. However, this loss function is not [[convex function|convex]], which makes the regularization problem very difficult to minimize computationally. Therefore, we look for convex substitutes for the 0-1 loss. The hinge loss, <math> V(y_i,f(x_i)) = (1-yf(x))_+</math> where <math>(s)_+ = max(s,0)</math>, provides such a [[convex relaxation]]. In fact, the hinge loss is the tightest convex [[upper bound]] to the 0-1 misclassification loss function,<ref name="Lee 2012 67–81"/> and with infinite data returns the [[Bayes' theorem|Bayes]] optimal solution:<ref name="Rosasco 2004 1063–1076"/><ref>{{cite journal|last=Lin|first=Yi|title=Support Vector Machines and the Bayes Rule in Classification|journal=Data Mining and Knowledge Discovery|date=July 2002|volume=6|issue=3|pages=259–275|doi=10.1023/A:1015469627679|url=http://cbio.ensmp.fr/~jvert/svn/bibli/local/Lin2002Support.pdf}}</ref>