Loss functions for classification: Difference between revisions

Content deleted Content added
Kjross (talk | contribs)
No edit summary
Kjross (talk | contribs)
No edit summary
Line 6:
impacts the optimal <math>f^{*}</math> which minimizes empirical risk, as well as the computational complexity of the learning algorithm.
 
Given the binary nature of classification, a natural selection for a loss function (assuming equal disdaincost for [[false positives and false negatives]]) would be the 0-1 [[indicator function]] which takes the value of 0 if the predicted classification equals that of the true class or a 1 if the predicted classification does not match the true class. Consequently, we could choose the loss function:
:<math>V(f(\vec{x}),y)=\mathbf{\theta}(-yf(\vec{x}))</math>
where <math>\mathbf{\theta}</math> indicates the [[Heaviside step function]].
However, this loss function is not non-convex, makingand itnon-smooth, intractableand solving for mostthe optimal solution is an [[NP-hard]] combinatorial optimization problemsproblem. (cite utah) As a result, we seek continuous, convex '''loss function surrogates''' which are tractable for our learning algorithms. Some of these surrogates are described below.
 
== Square Loss ==