impacts the optimal <math>f^{*}</math> which minimizes empirical risk, as well as the computational complexity of the learning algorithm.
Given the binary nature of classification, a natural selection for a loss function (assuming equal disdaincost for [[false positives and false negatives]]) would be the 0-1 [[indicator function]] which takes the value of 0 if the predicted classification equals that of the true class or a 1 if the predicted classification does not match the true class. Consequently, we could choose the loss function:
where <math>\mathbf{\theta}</math> indicates the [[Heaviside step function]].
However, this loss function is not non-convex,makingand itnon-smooth, intractableand solving for mostthe optimal solution is an [[NP-hard]] combinatorial optimization problemsproblem. (cite utah) As a result, we seek continuous, convex '''loss function surrogates''' which are tractable for our learning algorithms. Some of these surrogates are described below.