Content deleted Content added
m →Cross entropy loss (Log Loss): use e^{-x} to be consistent with other sections |
adding links to references using Google Scholar |
||
Line 54:
the square loss function is both convex and smooth and matches the 0–1 [[indicator function]] when <math>yf(\vec{x})= 0</math> and when <math>yf(\vec{x}) = 1</math>. However, the square loss function tends to penalize outliers excessively, leading to slower convergence rates (with regards to sample complexity) than for the logistic loss or hinge loss functions.<ref name="mit" /> In addition, functions which yield high values of <math>f(\vec{x})</math> for some <math>x \in X</math> will perform poorly with the square loss function, since high values of <math>yf(\vec{x})</math> will be penalized severely, regardless of whether the signs of <math>y</math> and <math>f(\vec{x})</math> match.
A benefit of the square loss function is that its structure lends itself to easy cross validation of regularization parameters. Specifically for [[Tikhonov regularization]], one can solve for the regularization parameter using leave-one-out [[cross-validation (statistics) |cross-validation]] in the same time as it would take to solve a single problem.<ref>{{Citation| last= Rifkin| first= Ryan M.| last2= Lippert| first2= Ross A.| title= Notes on Regularized Least Squares| publisher= MIT Computer Science and Artificial Intelligence Laboratory| date= 1 May 2007|url=https://dspace.mit.edu/bitstream/handle/1721.1/37318/MIT-CSAIL-TR-2007-025.pdf?sequence=1}}</ref>
The minimizer of <math>I[f]</math> for the square loss function is
|