Revision as of 20:04, 29 July 2019 edit 64.58.145.95 (talk) fixed sign f =f_0/1 Tag: Visual edit ← Previous edit		Revision as of 20:19, 29 July 2019 edit undo 64.58.145.95 (talk) updated refrences Tag: Visual edit Next edit →
Line 58: A loss function is said to be ''classification-calibrated or Bayes consistent'' if its optimal <math>f^_{\phi}</math> is such that <math>f^_{0/1}(\vec{x}) = \operatorname{sgn}(f^_{\phi}(\vec{x}))</math>and is thus optimal under the Bayes decision rule. A Bayes consistent loss function allows us to find the Bayes optimal decision function <math>f^_{\phi}</math> by directly minimizing the expected risk and without having to explicitly model the probability density functions. For convex margin loss <math>\phi(\upsilon)</math>, it can be shown that <math>\phi(\upsilon)</math>is Bayes consistent if and only if it is differentiable at 0 and <math>\phi'(0)=0</math><ref>{{Cite journal\|last=Bartlett\|first=Peter L.\|last2=Jordan\|first2=Michael I.\|last3=Mcauliffe\|first3=Jon D.\|date=2006\|title=Convexity, Classification, and Risk Bounds\|url=https://www.jstor.org/stable/30047445\|journal=Journal of the American Statistical Association\|volume=101\|issue=473\|pages=138–156\|issn=0162-1459}}</ref><ref name="mit" />. Yet, this result does not exclude the existence of non-convex Bayes consistent loss functions. A more general result states that Bayes consistent loss functions can be generated using the following formulation <ref name="~~robust~~:0"> {{~~Citation~~Cite journal\|last=Masnadi-Shirazi\|first=Hamed\|last2=Vasconcelos\|first2=Nuno\|date=2008\|title=On the Design of Loss Functions for Classification: ~~theory~~Theory, ~~robustness~~Robustness to ~~outliers~~Outliers, and SavageBoost\|url=http://~~www~~dl.~~svcl~~acm.~~ucsd.edu~~org/~~publications/conference/2008/nips08/NIPS08LossesWITHTITLE~~citation.~~pdf~~cfm?id=2981780.2981911\|~~publisher~~journal=~~Statistical~~Proceedings ~~Visual~~of ~~Computing~~the ~~Laboratory,~~21st ~~University~~International ofConference ~~California,~~on Neural Information ~~San~~Processing ~~Diego~~Systems\|~~accessdate~~series=6NIPS'08\|___location=USA\|publisher=Curran ~~December~~Associates ~~2014~~Inc.\|~~last2~~pages=~~Vasconcelos~~1049–1056\|~~first2~~isbn=~~Nuno~~9781605609492}}</ref> <math>\phi(v)=C[f^{-1}(v)]+(1-f^{-1}(v))C'[f^{-1}(v)] \;\;\;\;\;(2)</math>, where <math>f(\eta), (0\leq \eta \leq 1)</math> is any invertible function such that <math>f^{-1}(-v)=1-f^{-1}(v)</math> and <math>C(\eta)</math>is any differentiable strictly concave function such that <math>C(\eta)=C(1-\eta)</math>. Table-I shows the generated Bayes consistent loss functions for some example choices of <math>C(\eta)</math>and <math>f^{-1}(v)</math>. Note that the Savage and Tangent loss are not convex. Such non-convex loss functions have been shown to be useful in dealing with outliers in classification<ref name="~~robust~~:0" /><ref>{{Cite journal\|last=Leistner\|first=C.\|last2=Saffari\|first2=A.\|last3=Roth\|first3=P. M.\|last4=Bischof\|first4=H.\|date=2009-9\|title=On robustness of on-line boosting - a competitive study\|url=https://ieeexplore.ieee.org/document/5457451\|journal=2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops\|pages=1362–1369\|doi=10.1109/ICCVW.2009.5457451}}</ref>. For all loss functions generated from (2) , the posterior probability <math>p(y=1\|\vec{x})</math> can be found using the invertible ''link function'' as <math>p(y=1\|\vec{x})=\eta=f^{-1}(v)</math>. {\| class="wikitable" \|+Table-I

Loss functions for classification: Difference between revisions