Loss functions for classification: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 10:51, 10 November 2024 edit Klbrain (talk \| contribs) Autopatrolled, Extended confirmed users, New page reviewers 97,764 edits Closing April merge proposal; consensus not to merge; see Talk:Scoring rule#Discussion on possible merging of this page ← Previous edit		Latest revision as of 23:53, 20 July 2025 edit undo Maxeto0910 (talk \| contribs) Extended confirmed users 116,705 edits →Proper loss functions, loss margin and regularization: no sentence Tag: Visual edit
(8 intermediate revisions by 3 users not shown)
Line 104: ==Proper loss functions, loss margin and regularization== [[File:LogitLossMarginWithMu.jpg\|alt=\|thumb\|(Red) standard Logistic loss (<math>\gamma=1, \mu=2</math>) and (Blue) increased margin Logistic loss (<math>\gamma=0.2</math>).]] For proper loss functions, the ''loss margin'' can be defined as <math>\mu_{\phi}=-\frac{\phi'(0)}{\phi''(0)}</math> and shown to be directly related to the regularization properties of the classifier.<ref>{{Cite journal\|last1=Vasconcelos\|first1=Nuno\|last2=Masnadi-Shirazi\|first2=Hamed\|date=2015\|title=A View of Margin Losses as Regularizers of Probability Estimates\|url=http://jmlr.org/papers/v16/masnadi15a.html\|journal=Journal of Machine Learning Research\|volume=16\|issue=85\|pages=2751–2795\|issn=1533-7928}}</ref> Specifically a loss function of larger margin increases regularization and produces better estimates of the posterior probability. For example, the loss margin can be increased for the logistic loss by introducing a <math>\gamma</math> parameter and writing the logistic loss as <math>\frac{1}{\gamma}\log(1+e^{-\gamma v})</math> where smaller <math>0<\gamma<1</math> increases the margin of the loss. It is shown that this is directly equivalent to decreasing the learning rate in [[gradient boosting]] <math>F_m(x) = F_{m-1}(x) + \gamma h_m(x),</math> where decreasing <math>\gamma</math> improves the regularization of the boosted classifier. The theory makes it clear that when a learning rate of <math>\gamma</math> is used, the correct formula for retrieving the posterior probability is now <math>\eta=f^{-1}(\gamma F(x))</math>. Line 168: :<math> \begin{align} \phi(v) & = C[f^{-1}(v)]+\left( 1-f^{-1}(v)\right) C'[f^{-1}(v)] ~~= 4(\arctan(v)+\frac{1}{2})(1-(\arctan(v)+\frac{1}{2}))+(1-(\arctan(v)+\frac{1}{2}))(4-8(\arctan(v)+\frac{1}{2}))\\~~ \\ & = 4 \left( \arctan(v)+\frac{1}{2} \right) \left( 1- \left( \arctan(v)+\frac{1}{2} \right) \right) + \left( 1- \left( \arctan(v)+\frac{1}{2} \right) \right) \left( 4-8 \left( \arctan(v)+\frac{1}{2} \right) \right) \\ & = \left( 2\arctan(v)-1 \right) ^2. \end{align} </math> Line 177 ⟶ 178: The minimizer of <math>I[f]</math> for the Tangent loss function can be directly found from equation (1) as :<math>f^*_\text{Tangent}= \tan \left( \eta-\frac{1}{2} \right) =\tan \left( p \left( 1\mid x \right) -\frac{1}{2}\right) .</math> == Hinge loss == Line 212 ⟶ 213: {{Reflist}} {{Artificial intelligence navbox}} ~~{{Differentiable computing}}~~ [[Category:Machine learning algorithms]]