Revision as of 18:50, 26 July 2019 edit Holydiver4 (talk \| contribs) 13 edits added f(\eta) column to table Tag: Visual edit ← Previous edit		Revision as of 21:59, 26 July 2019 edit undo Holydiver4 (talk \| contribs) 13 edits added conditional risk Tag: Visual edit Next edit →
Line 24: </math> The second equality follows from the properties described above. The third equality follows from the fact that 1 and −1 are the only possible values for <math>y</math>, and the fourth because <math>p(-1\mid x)=1-p(1\mid x)</math>. The term within brackets <math> [\phi(f(\vec{x})) p(1\mid\vec{x})+\phi(-f(\vec{x})) (1-p(1\mid\vec{x}))] As a result, one can solve for the minimizers of <math>I[f]</math> for any convex loss functions with these properties by differentiating the last equality with respect to <math>f</math> and setting the derivative equal to 0. Thus, minimizers for all of the loss function surrogates described below are easily obtained as functions of only <math>f(\vec{x})</math> and <math>p(1\mid x)</math>.<ref name="mitlec" />▼ </math> is known as the ''conditional risk.'' One can solve for the minimizer of <math>I[f]</math> by taking the functional derivative of the last equality with respect to <math>f</math> and setting the derivative equal to 0. This will result to the following equation <math> \frac{\partial \phi(f)}{\partial f}\eta + \frac{\partial \phi(-f)}{\partial f}(1-\eta)=0 \;\;\;\;\;(1) </math> which is also equivalent to setting the derivative of the conditional risk equal to zero. ▲~~As a result, one~~which can ~~solve~~be ~~for the minimizers of <math>I[f]</math>~~solved for ~~any convex loss functions with these properties by differentiating the last equality with respect to <math>f</math> and setting the derivative equal to 0.~~ Thus, minimizers for all of the loss function surrogates described below are easily obtained as functions of only <math>f(\vec{x})</math> and <math>p(1\mid x)</math>.<ref name="mitlec" /> Given the binary nature of classification, a natural selection for a loss function (assuming equal cost for [[false positives and false negatives]]) would be the [[0-1 loss function]] (0–1 [[indicator function]]), which takes the value of 0 if the predicted classification equals that of the true class or a 1 if the predicted classification does not match the true class. This selection is modeled by Line 53 ⟶ 65: <math>\phi(v)=C[f^{-1}(v)]+(1-f^{-1}(v))C'[f^{-1}(v)]</math>, where <math>f(\eta), (0\leq \eta \leq 1)</math> is any invertible function such that <math>f^{-1}(-v)=1-f^{-1}(v)</math> and <math>C(\eta)</math>is any differentiable strictly concave function such that <math>C(\eta)=C(1-\eta)</math>. Table-I shows the generated Bayes consistent loss functions for some ~~different~~example choices of <math>C(\eta)</math>and <math>f^{-1}(v)</math>. Note that the Savage and Tangent loss are not convex. Such non-convex loss functions have been shown to be useful in dealing with outliers in classification<ref name="robust" /><ref>{{Cite journal\|last=Leistner\|first=C.\|last2=Saffari\|first2=A.\|last3=Roth\|first3=P. M.\|last4=Bischof\|first4=H.\|date=2009-9\|title=On robustness of on-line boosting - a competitive study\|url=https://ieeexplore.ieee.org/document/5457451\|journal=2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops\|pages=1362–1369\|doi=10.1109/ICCVW.2009.5457451}}</ref>. For such loss functions, the posterior probability <math>p(y=1\|\vec{x})</math> can be derived using the invertible ''link function'' as <math>p(y=1\|\vec{x})=\eta=f^{-1}(v)</math>. {\| class="wikitable" \|+Table-I Line 91 ⟶ 103: \|<math>\arctan(v)+\frac{1}{2}</math> \|<math>\tan(\eta-\frac{1}{2})</math> \|}<br />The sole minimizer of the expected risk associated with the above generated loss functions can be found from equation (1) and is equal to the corresponding <math> ~~\|}<br />~~ f(\eta) </math>. This holds even for the nonconvex loss functions which means that gradient descent based algorithms such as [[Gradient boosting\|Gradient Boosting]] can be used to effectively construct the minimizer in practice. == Square loss == While more commonly used in regression, the square loss function can be re-written as a function <math>\phi(yf(\vec{x}))</math> and utilized for classification. Defined as

Loss functions for classification: Difference between revisions