Revision as of 19:09, 6 December 2013 edit Mgfbinae (talk \| contribs) 122 edits →Relevant background ← Previous edit		Revision as of 19:11, 6 December 2013 edit undo Mgfbinae (talk \| contribs) 122 edits →Lasso regularization Next edit →
Line 27: Consider the [[Tikhonov regularization]] problem with square loss and with the <math>\ell_1</math> norm as the regularization penalty: :<math>\min_{w\in\mathbb{R}^n} \frac{1}{n}\sum_{i=1}^n (y_i- \langle w,x_i\rangle)^2+ \lambda \\|w\\|_1, </math> where <math>x_i\in \mathbb{R}^d\text{ and } y_i\in\mathbb{R}.</math> The <math>\ell_1</math> regularization penalty is sometimes referred to as <i>lasso</i> ([[Least_squares#Lasso_method\|least absolute shrinkage and selection operator]]).<ref name=tibshirani /> Such <math>\ell_1</math> regularization problems are interesting because they induce <i> sparse</i> solutions, that is, solutions <math>w</math> to the minimization problem have relatively few nonzero components. Lasso can be seen to be a convex relaxation of the non-convex problem :<math>\min_{w\in\mathbb{R}^n} \frac{1}{n}\sum_{i=1}^n (y_i- \langle w,x_i\rangle)^2+ \lambda \\|w\\|_0, </math> where <math>\\|w\\|_0</math> denotes the <math>\ell_0</math> "norm", which is the number of nonzero entries of the vector <math>w</math>. Sparse solutions are of particular interest in learning theory for interpretability of results: a sparse solution can identify a small number of important factors.<ref name=tibshirani>{{cite journal\|last=Tibshirani\|first=R.\|title=Regression shrinkage and selection via the lasso\|journal=J. R. Stat. Soc., Ser. B\|year=1996\|volume=58\|series=1\|issue=1\|pages=267-288}}</ref> ===Solving for <math>\ell_1</math> proximity operator===

Proximal gradient methods for learning: Difference between revisions