Revision as of 18:19, 4 December 2014 edit Kjross (talk \| contribs) 150 edits No edit summary ← Previous edit		Revision as of 18:47, 4 December 2014 edit undo Kjross (talk \| contribs) 150 edits No edit summary Next edit →
Line 1: {{User sandbox}} <!-- EDIT BELOW THIS LINE --> '''Loss function surrogates for classification''' are computationally feasible [[loss functions]] representing the price we will pay for inaccuracy in our predictions in classification problems. <ref>{{cite doi\|10.1162/089976604773135104}}</ref> Specifically, if <math>g: X \mapsto</math> {-1,1} represents the mapping of a vector <math>\vec{x} \in X</math> to a class label <math>y \in </math> {-1,1}, we wish to find a function <math>f: X \mapsto \mathbb{R}</math> which best approximates the true mapping <math>g</math>. (citation needed) Given that [[loss functions]] are always true functions of only one variable, it is standard practice to define loss functions for classification solely in terms of the product of the true classifier <math>y</math> and the predicted value <math>f(\vec{x})</math>. (citation needed) Selection of this loss function <math>V(f(\vec{x}),y)=\phi(yf(\vec{x}))</math> impacts the optimal <math>f^{*}</math> which minimizes empirical risk, as well as the computational complexity of the learning algorithm. Given the binary nature of classification, a natural selection for a loss function (assuming equal disdain for [[false positives and false negatives]] would be the 0-1 [[indicator function]] which takes the value of 0 if the predicted classification equals that of the true class or a 1 if the predicted classification does not match the true class. == Square Loss ==

Loss functions for classification: Difference between revisions