Content deleted Content added
added "margin losses". |
Added classification calibrated and Bayes consistency. |
||
Line 18:
For computational ease, it is standard practice to write [[loss functions]] as functions of only one variable. Within classification, loss functions are generally written solely in terms of the product of the true classifier <math>y</math> and the predicted value <math>f(\vec{x})</math>.<ref name="robust"> {{Citation | last= Masnadi-Shirazi | first= Hamed | last2= Vasconcelos | first2= Nuno | title= On the Design of Loss Functions for Classification: theory, robustness to outliers, and SavageBoost | publisher= Statistical Visual Computing Laboratory, University of California, San Diego | url= http://www.svcl.ucsd.edu/publications/conference/2008/nips08/NIPS08LossesWITHTITLE.pdf | accessdate= 6 December 2014}}</ref> Selection of a loss function within this framework
:<math>V(f(\vec{x}),y)=\phi(-yf(\vec{x}))</math>
impacts the optimal <math>f^{*}
Given the binary nature of classification, a natural selection for a loss function (assuming equal cost for [[false positives and false negatives]]) would be the [[0-1 loss function]] (0–1 [[indicator function]]), which takes the value of 0 if the predicted classification equals that of the true class or a 1 if the predicted classification does not match the true class. This selection is modeled by
Line 26:
==Bounds for classification==
Utilizing [[Bayes' theorem]], it can be shown that the optimal <math>f^*</math>, which implements the Bayes optimal decision rule, for a binary classification problem is
:<math>f^*(\vec{x}) \;=\; \begin{cases} \;\;\;1& \text{if }p(1\mid\vec{x}) > p(-1\mid \vec{x}) \\ \;\;\;0 & \text{if }p(1\mid\vec{x}) = p(-1\mid\vec{x}) \\ -1 & \text{if }p(1\mid\vec{x}) < p(-1\mid\vec{x}) \end{cases}</math>.
A loss function <math>\phi(-yf(\vec{x}))</math>is said to be ''classification-calibrated or Bayes consistent'' if its optimal <math>f^*_{\phi}</math> is such that <math>f^*_{\phi}(\vec{x}) = \operatorname{sgn}(f^*(\vec{x}))</math>and is thus equivalent to the Bayes optimal decision rule. A Bayes consistent loss function allows us to find the Bayes optimal decision function by directly minimizing the expected risk and without having to explicitly model the probability density functions. Furthermore, it can be shown that for any convex loss function <math>V(yf_0(\vec{x}))</math>, where <math>f_0</math> is the function that minimizes this loss, if <math>f_0(\vec{x}) \ne 0</math> and <math>V</math> is decreasing in a neighborhood of 0, then <math>f^*(\vec{x}) = \operatorname{sgn}(f_0(\vec{x}))</math>
where <math>\operatorname{sgn}</math> is the [[sign function]] (for proof see <ref>{{Cite
This fact confers a consistency property upon all convex loss functions; specifically, all convex loss functions will lead to consistent results with the 0–1 loss function given the presence of infinite data. Consequently, we can bound the difference of any of these convex loss function from expected risk.<ref name="mit" />
|