Revision as of 20:08, 6 December 2014 edit Kjross (talk \| contribs) 150 edits →Logistic Loss ← Previous edit		Revision as of 20:14, 6 December 2014 edit undo Kjross (talk \| contribs) 150 edits →Square Loss Next edit →
Line 14: While more commonly used in regression, the square loss function can be re-written as a function <math>\phi(yf(\vec{x}))</math> and utilized for classification. Defined as :<math>V(f(\vec{x}),y) = (1-yf(\vec{x}))^2</math> the square loss function is both convex and smooth and matches the 0-1 [[indicator function]] when <math>yf(\vec{x})= 0</math> and when <math>yf(\vec{x}) = 1</math>. However, the square loss function tends to penalize outliers excessively leading to slower convergence rates than for the logistic loss or hinge loss functions. In addition, functions which yield high values of <math>f(\vec{x})</math> for some <math>x \in X</math> will perform poorly with the square loss function, since high values of <math>~~y f~~yf(\vec{x})</math> will be penalized severely, regardless of whether the signs of <math>~~y</math> <math>f~~yf(\vec{x})</math> match. == Hinge Loss ==

Loss functions for classification: Difference between revisions