Revision as of 22:33, 31 July 2019 edit 64.58.145.95 (talk) added tangent loss generation Tag: Visual edit ← Previous edit		Revision as of 03:44, 1 August 2019 edit undo Citation bot (talk \| contribs) Bots 5,872,790 edits m Add: isbn, doi, jstor. Removed URL that duplicated unique identifier. \| You can use this bot yourself. Report bugs here.\| Activated by User:Headbomb Next edit →
Line 58: A loss function is said to be ''classification-calibrated or Bayes consistent'' if its optimal <math>f^_{\phi}</math> is such that <math>f^_{0/1}(\vec{x}) = \operatorname{sgn}(f^_{\phi}(\vec{x}))</math>and is thus optimal under the Bayes decision rule. A Bayes consistent loss function allows us to find the Bayes optimal decision function <math>f^_{\phi}</math> by directly minimizing the expected risk and without having to explicitly model the probability density functions. For convex margin loss <math>\phi(\upsilon)</math>, it can be shown that <math>\phi(\upsilon)</math>is Bayes consistent if and only if it is differentiable at 0 and <math>\phi'(0)=0</math><ref>{{Cite journal\|last=Bartlett\|first=Peter L.\|last2=Jordan\|first2=Michael I.\|last3=Mcauliffe\|first3=Jon D.\|date=2006\|title=Convexity, Classification, and Risk Bounds~~\|url=https://www.jstor.org/stable/30047445~~\|journal=Journal of the American Statistical Association\|volume=101\|issue=473\|pages=138–156\|issn=0162-1459\|jstor=30047445\|doi=10.1198/016214505000000907}}</ref><ref name="mit" />. Yet, this result does not exclude the existence of non-convex Bayes consistent loss functions. A more general result states that Bayes consistent loss functions can be generated using the following formulation <ref name=":0">{{Cite journal\|last=Masnadi-Shirazi\|first=Hamed\|last2=Vasconcelos\|first2=Nuno\|date=2008\|title=On the Design of Loss Functions for Classification: Theory, Robustness to Outliers, and SavageBoost\|url=http://dl.acm.org/citation.cfm?id=2981780.2981911\|journal=Proceedings of the 21st International Conference on Neural Information Processing Systems\|series=NIPS'08\|___location=USA\|publisher=Curran Associates Inc.\|pages=1049–1056\|isbn=9781605609492}}</ref> <math>\phi(v)=C[f^{-1}(v)]+(1-f^{-1}(v))C'[f^{-1}(v)] \;\;\;\;\;(2)</math>, where <math>f(\eta), (0\leq \eta \leq 1)</math> is any invertible function such that <math>f^{-1}(-v)=1-f^{-1}(v)</math> and <math>C(\eta)</math>is any differentiable strictly concave function such that <math>C(\eta)=C(1-\eta)</math>. Table-I shows the generated Bayes consistent loss functions for some example choices of <math>C(\eta)</math>and <math>f^{-1}(v)</math>. Note that the Savage and Tangent loss are not convex. Such non-convex loss functions have been shown to be useful in dealing with outliers in classification<ref name=":0" /><ref>{{Cite journal\|last=Leistner\|first=C.\|last2=Saffari\|first2=A.\|last3=Roth\|first3=P. M.\|last4=Bischof\|first4=H.\|date=2009-9\|title=On robustness of on-line boosting - a competitive study\|url=https://ieeexplore.ieee.org/document/5457451\|journal=2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops\|pages=1362–1369\|doi=10.1109/ICCVW.2009.5457451\|isbn=978-1-4244-4442-7}}</ref>. For all loss functions generated from (2) , the posterior probability <math>p(y=1\|\vec{x})</math> can be found using the invertible ''link function'' as <math>p(y=1\|\vec{x})=\eta=f^{-1}(v)</math>. {\| class="wikitable" \|+Table-I Line 145: == Tangent loss == The Tangent loss<ref>{{Cite journal\|last=Masnadi-Shirazi\|first=H.\|last2=Mahadevan\|first2=V.\|last3=Vasconcelos\|first3=N.\|date=2010-6\|title=On the design of robust classifiers for computer vision\|url=https://ieeexplore.ieee.org/document/5540136\|journal=2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition\|pages=779–786\|doi=10.1109/CVPR.2010.5540136\|isbn=978-1-4244-6984-0}}</ref> can be generated using (2) and Table-I as follows :<math>\phi(v)=C[f^{-1}(v)]+(1-f^{-1}(v))C'[f^{-1}(v)] = 4(\arctan(v)+\frac{1}{2})(1-(\arctan(v)+\frac{1}{2}))+(1-(\arctan(v)+\frac{1}{2}))(4-8(\arctan(v)+\frac{1}{2})) = (2\arctan(v)-1)^2.</math> The Tangent loss is quasi-convex and is bounded for large negative values which makes it less sensitive to outliers. Interestingly, the Tangent loss also assigns a bounded penalty to data points that have been classified "too correctly". This can help prevent over training on the data set. The Tangent loss has been used in [[Gradient boosting\|Gradient Boosting]], the TangentBoost algorithm and Alternating Decision Forests<ref>{{Cite journal\|last=Schulter\|first=S.\|last2=Wohlhart\|first2=P.\|last3=Leistner\|first3=C.\|last4=Saffari\|first4=A.\|last5=Roth\|first5=P. M.\|last6=Bischof\|first6=H.\|date=2013-6\|title=Alternating Decision Forests\|url=https://ieeexplore.ieee.org/document/6618916\|journal=2013 IEEE Conference on Computer Vision and Pattern Recognition\|pages=508–515\|doi=10.1109/CVPR.2013.72\|isbn=978-0-7695-4989-7}}</ref>. == Hinge loss ==

Loss functions for classification: Difference between revisions