Content deleted Content added
Holydiver4 (talk | contribs) No edit summary |
Holydiver4 (talk | contribs) No edit summary |
||
Line 60:
For convex margin loss <math>\phi(\upsilon)</math>, it can be shown that <math>\phi(\upsilon)</math>is Bayes consistent if and only if it is differentiable at 0 and <math>\phi'(0)=0</math><ref>{{Cite journal|last=Bartlett|first=Peter L.|last2=Jordan|first2=Michael I.|last3=Mcauliffe|first3=Jon D.|date=2006|title=Convexity, Classification, and Risk Bounds|url=https://www.jstor.org/stable/30047445|journal=Journal of the American Statistical Association|volume=101|issue=473|pages=138–156|issn=0162-1459}}</ref><ref name="mit" />. Yet, this result does not exclude the existence of non-convex Bayes consistent loss functions. A more general result states that Bayes consistent loss functions can be generated using the following formulation <ref name="robust"> {{Citation|last=Masnadi-Shirazi|first=Hamed|title=On the Design of Loss Functions for Classification: theory, robustness to outliers, and SavageBoost|url=http://www.svcl.ucsd.edu/publications/conference/2008/nips08/NIPS08LossesWITHTITLE.pdf|publisher=Statistical Visual Computing Laboratory, University of California, San Diego|accessdate=6 December 2014|last2=Vasconcelos|first2=Nuno}}</ref>
<math>\phi(v)=C[f^{-1}(v)]+(1-f^{-1}(v))C'[f^{-1}(v)] \;\;\;\;\;(2)</math>,
where <math>f(\eta), (0\leq \eta \leq 1)</math> is any invertible function such that <math>f^{-1}(-v)=1-f^{-1}(v)</math> and <math>C(\eta)</math>is any differentiable strictly concave function such that <math>C(\eta)=C(1-\eta)</math>. Table-I shows the generated Bayes consistent loss functions for some example choices of <math>C(\eta)</math>and <math>f^{-1}(v)</math>. Note that the Savage and Tangent loss are not convex. Such non-convex loss functions have been shown to be useful in dealing with outliers in classification<ref name="robust" /><ref>{{Cite journal|last=Leistner|first=C.|last2=Saffari|first2=A.|last3=Roth|first3=P. M.|last4=Bischof|first4=H.|date=2009-9|title=On robustness of on-line boosting - a competitive study|url=https://ieeexplore.ieee.org/document/5457451|journal=2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops|pages=1362–1369|doi=10.1109/ICCVW.2009.5457451}}</ref>. For
{| class="wikitable"
|+Table-I
Line 100:
|<math>\arctan(v)+\frac{1}{2}</math>
|<math>\tan(\eta-\frac{1}{2})</math>
|}<br />The sole minimizer of the expected risk <math>f^*</math>associated with the above generated loss functions can be found from equation (1) and is equal to the corresponding <math>
f(\eta)
</math>. This holds even for the nonconvex loss functions which means that gradient descent based algorithms such as [[Gradient boosting|Gradient Boosting]] can be used to
== Square loss ==
While more commonly used in regression, the square loss function can be re-written as a function <math>\phi(yf(\vec{x}))</math> and utilized for classification. Defined as
|