Content deleted Content added
added tangent loss generation |
Citation bot (talk | contribs) m Add: isbn, doi, jstor. Removed URL that duplicated unique identifier. | You can use this bot yourself. Report bugs here.| Activated by User:Headbomb |
||
Line 58:
A loss function is said to be ''classification-calibrated or Bayes consistent'' if its optimal <math>f^*_{\phi}</math> is such that <math>f^*_{0/1}(\vec{x}) = \operatorname{sgn}(f^*_{\phi}(\vec{x}))</math>and is thus optimal under the Bayes decision rule. A Bayes consistent loss function allows us to find the Bayes optimal decision function <math>f^*_{\phi}</math> by directly minimizing the expected risk and without having to explicitly model the probability density functions.
For convex margin loss <math>\phi(\upsilon)</math>, it can be shown that <math>\phi(\upsilon)</math>is Bayes consistent if and only if it is differentiable at 0 and <math>\phi'(0)=0</math><ref>{{Cite journal|last=Bartlett|first=Peter L.|last2=Jordan|first2=Michael I.|last3=Mcauliffe|first3=Jon D.|date=2006|title=Convexity, Classification, and Risk Bounds
<math>\phi(v)=C[f^{-1}(v)]+(1-f^{-1}(v))C'[f^{-1}(v)] \;\;\;\;\;(2)</math>,
where <math>f(\eta), (0\leq \eta \leq 1)</math> is any invertible function such that <math>f^{-1}(-v)=1-f^{-1}(v)</math> and <math>C(\eta)</math>is any differentiable strictly concave function such that <math>C(\eta)=C(1-\eta)</math>. Table-I shows the generated Bayes consistent loss functions for some example choices of <math>C(\eta)</math>and <math>f^{-1}(v)</math>. Note that the Savage and Tangent loss are not convex. Such non-convex loss functions have been shown to be useful in dealing with outliers in classification<ref name=":0" /><ref>{{Cite journal|last=Leistner|first=C.|last2=Saffari|first2=A.|last3=Roth|first3=P. M.|last4=Bischof|first4=H.|date=2009-9|title=On robustness of on-line boosting - a competitive study|url=https://ieeexplore.ieee.org/document/5457451|journal=2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops|pages=1362–1369|doi=10.1109/ICCVW.2009.5457451|isbn=978-1-4244-4442-7}}</ref>. For all loss functions generated from (2) , the posterior probability <math>p(y=1|\vec{x})</math> can be found using the invertible ''link function'' as <math>p(y=1|\vec{x})=\eta=f^{-1}(v)</math>.
{| class="wikitable"
|+Table-I
Line 145:
== Tangent loss ==
The Tangent loss<ref>{{Cite journal|last=Masnadi-Shirazi|first=H.|last2=Mahadevan|first2=V.|last3=Vasconcelos|first3=N.|date=2010-6|title=On the design of robust classifiers for computer vision|url=https://ieeexplore.ieee.org/document/5540136|journal=2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition|pages=779–786|doi=10.1109/CVPR.2010.5540136|isbn=978-1-4244-6984-0}}</ref> can be generated using (2) and Table-I as follows
:<math>\phi(v)=C[f^{-1}(v)]+(1-f^{-1}(v))C'[f^{-1}(v)] = 4(\arctan(v)+\frac{1}{2})(1-(\arctan(v)+\frac{1}{2}))+(1-(\arctan(v)+\frac{1}{2}))(4-8(\arctan(v)+\frac{1}{2})) = (2\arctan(v)-1)^2.</math>
The Tangent loss is quasi-convex and is bounded for large negative values which makes it less sensitive to outliers. Interestingly, the Tangent loss also assigns a bounded penalty to data points that have been classified "too correctly". This can help prevent over training on the data set. The Tangent loss has been used in [[Gradient boosting|Gradient Boosting]], the TangentBoost algorithm and Alternating Decision Forests<ref>{{Cite journal|last=Schulter|first=S.|last2=Wohlhart|first2=P.|last3=Leistner|first3=C.|last4=Saffari|first4=A.|last5=Roth|first5=P. M.|last6=Bischof|first6=H.|date=2013-6|title=Alternating Decision Forests|url=https://ieeexplore.ieee.org/document/6618916|journal=2013 IEEE Conference on Computer Vision and Pattern Recognition|pages=508–515|doi=10.1109/CVPR.2013.72|isbn=978-0-7695-4989-7}}</ref>.
== Hinge loss ==
|