Revision as of 00:19, 20 June 2019 edit Drevicko (talk \| contribs) 88 edits m →Cross entropy loss (Log Loss): use e^{-x} to be consistent with other sections Tag: 2017 wikitext editor ← Previous edit		Revision as of 21:13, 11 July 2019 edit undo Jarble (talk \| contribs) Autopatrolled, Extended confirmed users 150,086 edits adding links to references using Google Scholar Next edit →
Line 54: the square loss function is both convex and smooth and matches the 0–1 [[indicator function]] when <math>yf(\vec{x})= 0</math> and when <math>yf(\vec{x}) = 1</math>. However, the square loss function tends to penalize outliers excessively, leading to slower convergence rates (with regards to sample complexity) than for the logistic loss or hinge loss functions.<ref name="mit" /> In addition, functions which yield high values of <math>f(\vec{x})</math> for some <math>x \in X</math> will perform poorly with the square loss function, since high values of <math>yf(\vec{x})</math> will be penalized severely, regardless of whether the signs of <math>y</math> and <math>f(\vec{x})</math> match. A benefit of the square loss function is that its structure lends itself to easy cross validation of regularization parameters. Specifically for [[Tikhonov regularization]], one can solve for the regularization parameter using leave-one-out [[cross-validation (statistics) \|cross-validation]] in the same time as it would take to solve a single problem.<ref>{{Citation\| last= Rifkin\| first= Ryan M.\| last2= Lippert\| first2= Ross A.\| title= Notes on Regularized Least Squares\| publisher= MIT Computer Science and Artificial Intelligence Laboratory\| date= 1 May 2007\|url=https://dspace.mit.edu/bitstream/handle/1721.1/37318/MIT-CSAIL-TR-2007-025.pdf?sequence=1}}</ref> The minimizer of <math>I[f]</math> for the square loss function is

Loss functions for classification: Difference between revisions