Proximal gradient methods for learning: Difference between revisions

Content deleted Content added
Mgfbinae (talk | contribs)
Mgfbinae (talk | contribs)
Line 27:
Consider the [[Tikhonov regularization]] problem with square loss and with the <math>\ell_1</math> norm as the regularization penalty:
:<math>\min_{w\in\mathbb{R}^n} \frac{1}{n}\sum_{i=1}^n (y_i- \langle w,x_i\rangle)^2+ \lambda \|w\|_1, </math>
where <math>x_i\in \mathbb{R}^d\text{ and } y_i\in\mathbb{R}.</math> The <math>\ell_1</math> regularization penalty is sometimes referred to as <i>lasso</i> ([[Least_squares#Lasso_method|least absolute shrinkage and selection operator]]).<ref name=tibshirani /> Such <math>\ell_1</math> regularization problems are interesting because they induce <i> sparse</i> solutions, that is, solutions <math>w</math> to the minimization problem have relatively few nonzero components. Lasso can be seen to be a convex relaxation of the non-convex problem
:<math>\min_{w\in\mathbb{R}^n} \frac{1}{n}\sum_{i=1}^n (y_i- \langle w,x_i\rangle)^2+ \lambda \|w\|_0, </math>
where <math>\|w\|_0</math> denotes the <math>\ell_0</math> "norm", which is the number of nonzero entries of the vector <math>w</math>. Sparse solutions are of particular interest in learning theory for interpretability of results: a sparse solution can identify a small number of important factors.<ref name=tibshirani>{{cite journal|last=Tibshirani|first=R.|title=Regression shrinkage and selection via the lasso|journal=J. R. Stat. Soc., Ser. B|year=1996|volume=58|series=1|issue=1|pages=267-288}}</ref>
 
===Solving for <math>\ell_1</math> proximity operator===