Loss functions for classification: Difference between revisions

Content deleted Content added
update URL for Piyush PDF since old URL broken
Bayes consistency is \phi'(0)<0; originally said \phi'(0)=0 which is incorrect. See Thm 2 in Bartlett et al. 2006.
Line 55:
A loss function is said to be ''classification-calibrated or Bayes consistent'' if its optimal <math>f^*_{\phi}</math> is such that <math>f^*_{0/1}(\vec{x}) = \operatorname{sgn}(f^*_{\phi}(\vec{x}))</math>and is thus optimal under the Bayes decision rule. A Bayes consistent loss function allows us to find the Bayes optimal decision function <math>f^*_{\phi}</math> by directly minimizing the expected risk and without having to explicitly model the probability density functions.
 
For convex margin loss <math>\phi(\upsilon)</math>, it can be shown that <math>\phi(\upsilon)</math> is Bayes consistent if and only if it is differentiable at 0 and <math>\phi'(0)=<0</math>.<ref>{{Cite journal|last1=Bartlett|first1=Peter L.|last2=Jordan|first2=Michael I.|last3=Mcauliffe|first3=Jon D.|date=2006|title=Convexity, Classification, and Risk Bounds|journal=Journal of the American Statistical Association|volume=101|issue=473|pages=138–156|issn=0162-1459|jstor=30047445|doi=10.1198/016214505000000907|s2cid=2833811}}</ref><ref name="mit" /> Yet, this result does not exclude the existence of non-convex Bayes consistent loss functions. A more general result states that Bayes consistent loss functions can be generated using the following formulation <ref name=":0">{{Cite journal|last1=Masnadi-Shirazi|first1=Hamed|last2=Vasconcelos|first2=Nuno|date=2008|title=On the Design of Loss Functions for Classification: Theory, Robustness to Outliers, and SavageBoost|url=https://papers.nips.cc/paper/3591-on-the-design-of-loss-functions-for-classification-theory-robustness-to-outliers-and-savageboost.pdf|journal=Proceedings of the 21st International Conference on Neural Information Processing Systems|series=NIPS'08|___location=USA|publisher=Curran Associates Inc.|pages=1049–1056|isbn=9781605609492}}</ref>
 
:<math>\phi(v)=C[f^{-1}(v)]+(1-f^{-1}(v))C'[f^{-1}(v)] \;\;\;\;\;(2)</math>,