Revision as of 18:26, 25 July 2019 edit 64.58.145.95 (talk) added how to generate Bayes consistent losses Tag: Visual edit ← Previous edit		Revision as of 21:51, 25 July 2019 edit undo 64.58.145.95 (talk) added table of losses Tag: Visual edit Next edit →
Line 31: :<math>f^_{0/1}(\vec{x}) \;=\; \begin{cases} \;\;\;1& \text{if }p(1\mid\vec{x}) > p(-1\mid \vec{x}) \\ \;\;\;0 & \text{if }p(1\mid\vec{x}) = p(-1\mid\vec{x}) \\ -1 & \text{if }p(1\mid\vec{x}) < p(-1\mid\vec{x}) \end{cases}</math>. A loss function ~~<math>\phi(yf(\vec{x}))</math>~~is said to be ''classification-calibrated or Bayes consistent'' if its optimal <math>f^_{\phi}</math> is such that <math>f^_{\phi}(\vec{x}) = \operatorname{sgn}(f^_{0/1}(\vec{x}))</math>and is thus optimal under the Bayes decision rule. A Bayes consistent loss function allows us to find the Bayes optimal decision function <math>f^*_{\phi}</math> by directly minimizing the expected risk and without having to explicitly model the probability density functions. For convex margin loss <math>\phi(\upsilon)</math>, it can be shown that <math>\phi(\upsilon)</math>is Bayes consistent if and only if it is differentiable at 0 and <math>\phi'(0)=0</math><ref>{{Cite journal\|last=Bartlett\|first=Peter L.\|last2=Jordan\|first2=Michael I.\|last3=Mcauliffe\|first3=Jon D.\|date=2006\|title=Convexity, Classification, and Risk Bounds\|url=https://www.jstor.org/stable/30047445\|journal=Journal of the American Statistical Association\|volume=101\|issue=473\|pages=138–156\|issn=0162-1459}}</ref><ref name="mit" />. Yet, this result does not exclude the existence of non-convex Bayes consistent loss functions. A more general result states that Bayes consistent loss functions can be generated using the following formulation <ref name="robust"> {{Citation\|last=Masnadi-Shirazi\|first=Hamed\|title=On the Design of Loss Functions for Classification: theory, robustness to outliers, and SavageBoost\|url=http://www.svcl.ucsd.edu/publications/conference/2008/nips08/NIPS08LossesWITHTITLE.pdf\|publisher=Statistical Visual Computing Laboratory, University of California, San Diego\|accessdate=6 December 2014\|last2=Vasconcelos\|first2=Nuno}}</ref> <math>\phi(v)=C[f^{-1}(v)]+(1-f^{-1}(v))C'[f^{-1}(v)]</math>, where <math>f(\eta), (0\leq \eta \leq 1)</math> is any invertible function such that <math>f^{-1}(-v)=1-f^{-1}(v)</math> and <math>C(\eta)</math>is any differentiable strictly concave function such that <math>C(\eta)=C(1-\eta)</math>. Table-I shows the generated Bayes consistent loss functions for some different choices of <math>C(\eta)</math>and <math>f^{-1}(v)</math>. Note that the Savage and Tangent loss are not convex. Such non-convex loss functions have been shown to be useful in dealing with outliers in classification<ref name="robust" /><ref>{{Cite journal\|last=Leistner\|first=C.\|last2=Saffari\|first2=A.\|last3=Roth\|first3=P. M.\|last4=Bischof\|first4=H.\|date=2009-9\|title=On robustness of on-line boosting - a competitive study\|url=https://ieeexplore.ieee.org/document/5457451\|journal=2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops\|pages=1362–1369\|doi=10.1109/ICCVW.2009.5457451}}</ref>. {\| class="wikitable" \|+Table-I !Loss Name !<math>\phi(v)</math> !<math>C(\eta)</math> !<math>f^{-1}(v)</math> \|- \|Exponential \|<math>e^{-v}</math> \|<math>2\sqrt{\eta(1-\eta)}</math> \|<math>\frac{e^{2v}}{1+e^{2v}}</math> \|- \|Logistic \|<math>\frac{1}{\ln(2)}\ln(1+e^{-v})</math> \|<math>\frac{1}{\ln(2)}[-\eta\ln(\eta)-(1-\eta)\ln(1-\eta)]</math> \|<math>\frac{e^v}{1+e^v}</math> \|- \|Square \|<math>(1-v)^2</math> \|<math>4\eta(1-\eta)</math> \|<math>\frac{1}{2}(v+1)</math> \|- \|Savage \|<math>\frac{1}{(1+e^v)^2}</math> \|<math>\eta(1-\eta)</math> \|<math>\frac{e^v}{1+e^v}</math> \|- \|Tangent \|<math>(2\arctan(v)-1)^2</math> \|<math>4\eta(1-\eta)</math> \|<math>\arctan(v)+\frac{1}{2}</math> \|} ==Simplifying expected risk for classification==

Loss functions for classification: Difference between revisions