Loss functions for classification: Difference between revisions

Content deleted Content added
Kjross (talk | contribs)
No edit summary
Kjross (talk | contribs)
No edit summary
Line 1:
{{User sandbox}}
<!-- EDIT BELOW THIS LINE -->
'''Loss function surrogates for classification''' are computationally feasible [[loss functions]] representing the price we will pay for inaccuracy in our predictions in classification problems. <ref>{{cite doi|10.1162/089976604773135104}}</ref> Specifically, if <math>g: X \mapsto</math> {-1,1} represents the mapping of a vector <math>\vec{x} \in X</math> to a class label <math>y \in </math> {-1,1}, we wish to find a function <math>f: X \mapsto \mathbb{R}</math> which best approximates the true mapping <math>g</math>. (citation needed) Given that [[loss functions]] are always true functions of only one variable, it is standard practice to define loss functions for classification solely in terms of the product of the true classifier <math>y</math> and the predicted value <math>f(\vec{x})</math>. (citation needed) Selection of this loss function <math>V(f(\vec{x}),y)=\phi(yf(\vec{x}))</math> impacts the optimal <math>f^{*}</math> which minimizes empirical risk, as well as the computational complexity of the learning algorithm.
Given the binary nature of classification, a natural selection for a loss function (assuming equal disdain for [[false positives and false negatives]] would be the 0-1 [[indicator function]] which takes the value of 0 if the predicted classification equals that of the true class or a 1 if the predicted classification does not match the true class.
 
== Square Loss ==