Content deleted Content added
Viktor Guer (talk | contribs) |
Maxeto0910 (talk | contribs) |
||
Line 104:
==Proper loss functions, loss margin and regularization==
[[File:LogitLossMarginWithMu.jpg|alt=|thumb|(Red) standard Logistic loss (<math>\gamma=1, \mu=2</math>) and (Blue) increased margin Logistic loss (<math>\gamma=0.2</math>)
For proper loss functions, the ''loss margin'' can be defined as <math>\mu_{\phi}=-\frac{\phi'(0)}{\phi''(0)}</math> and shown to be directly related to the regularization properties of the classifier.<ref>{{Cite journal|last1=Vasconcelos|first1=Nuno|last2=Masnadi-Shirazi|first2=Hamed|date=2015|title=A View of Margin Losses as Regularizers of Probability Estimates|url=http://jmlr.org/papers/v16/masnadi15a.html|journal=Journal of Machine Learning Research|volume=16|issue=85|pages=2751–2795|issn=1533-7928}}</ref> Specifically a loss function of larger margin increases regularization and produces better estimates of the posterior probability. For example, the loss margin can be increased for the logistic loss by introducing a <math>\gamma</math> parameter and writing the logistic loss as <math>\frac{1}{\gamma}\log(1+e^{-\gamma v})</math> where smaller <math>0<\gamma<1</math> increases the margin of the loss. It is shown that this is directly equivalent to decreasing the learning rate in [[gradient boosting]] <math>F_m(x) = F_{m-1}(x) + \gamma h_m(x),</math> where decreasing <math>\gamma</math> improves the regularization of the boosted classifier. The theory makes it clear that when a learning rate of <math>\gamma</math> is used, the correct formula for retrieving the posterior probability is now <math>\eta=f^{-1}(\gamma F(x))</math>.
|