Content deleted Content added
Citation bot (talk | contribs) m Add: citeseerx. Removed URL that duplicated unique identifier. | You can use this bot yourself. Report bugs here. | Activated by User:AManWithNoPlan | via #UCB_toolbar |
No edit summary |
||
Line 1:
[[File:BayesConsistentLosses2.jpg|thumb|Bayes consistent loss functions: Zero-one loss (gray), Savage loss (green), Logistic loss (orange), Exponential loss (purple), Tangent loss (brown), Square loss (blue)]]
In [[machine learning]] and [[mathematical optimization]], '''loss functions for classification''' are computationally feasible [[loss functions]] representing the price paid for inaccuracy of predictions in [[statistical classification|classification problem]]s (problems of identifying which category a particular observation belongs to).<ref name="mit">{{Cite journal | last1 = Rosasco | first1 = L. | last2 = De Vito | first2 = E. D. | last3 = Caponnetto | first3 = A. | last4 = Piana | first4 = M. | last5 = Verri | first5 = A. | url = http://web.mit.edu/lrosasco/www/publications/loss.pdf| title = Are Loss Functions All the Same? | doi = 10.1162/089976604773135104 | journal = Neural Computation | volume = 16 | issue = 5 | pages = 1063–1076 | year = 2004 | pmid = 15070510| pmc = | citeseerx = 10.1.1.109.6786 }}</ref> Given <math>\mathcal{X}</math> as the
:<math>I[f] = \displaystyle \int_{\mathcal{X} \times \mathcal{Y}} V(f(\vec{x}),y) p(\vec{x},y) \, d\vec{x} \, dy</math>
where <math>V(f(\vec{x}),y)</math> is
:<math>p(\vec{x},y)=p(y\mid\vec{x}) p(\vec{x}).</math>
In the case of binary classification, it is possible to simplify the calculation of expected risk from the integral specified above. Specifically,
Line 17 ⟶ 13:
:<math>
\begin{align}
I[f] & = \int_{\mathcal{X} \times \mathcal{Y}} V(f(\vec{x}),y) p(\vec{x},y) \,d\vec{x} \,dy \\[6pt]
& = \
& = \
& = \
\end{align}
</math>
Line 40 ⟶ 36:
where <math>H</math> indicates the [[Heaviside step function]].
However, this loss function is non-convex and non-smooth, and solving for the optimal solution is an [[NP-hard]] combinatorial optimization problem.<ref name="Utah">{{Citation | last= Piyush | first= Rai | title= Support Vector Machines (Contd.), Classification Loss Functions and Regularizers | publisher= Utah CS5350/6350: Machine Learning | date= 13 September 2011 | url= http://www.cs.utah.edu/~piyush/teaching/13-9-print.pdf | accessdate= 6 December 2014}}</ref> As a result, it is better to substitute
In practice, the probability distribution <math>p(\vec{x},y)</math> is unknown. Consequently, utilizing a training set of <math>n</math> [[iid|independently and identically distributed]] sample points
Line 183 ⟶ 179:
== Hinge loss ==
{{main|Hinge loss}}
The hinge loss function is defined with <math>\phi(\upsilon) = \max(0, 1-\upsilon) = [1-\upsilon]_{+}</math>, where <math>[a]_{+} = \max(0,a)</math> is the [[positive part]] function.
:<math>V(f(\vec{x}),y) = \max(0, 1-yf(\vec{x})) =
The hinge loss provides a relatively tight, convex upper bound on the 0–1 [[indicator function]]. Specifically, the hinge loss equals the 0–1 [[indicator function]] when <math>\operatorname{sgn}(f(\vec{x})) = y</math> and <math>|yf(\vec{x})| \geq 1</math>. In addition, the empirical risk minimization of this loss is equivalent to the classical formulation for [[support vector machines]] (SVMs). Correctly classified points lying outside the margin boundaries of the support vectors are not penalized, whereas points within the margin boundaries or on the wrong side of the hyperplane are penalized in a linear fashion compared to their distance from the correct boundary.<ref name="Utah" />
|