Proximal gradient methods for learning: Difference between revisions

Content deleted Content added
Mgfbinae (talk | contribs)
No edit summary
Mgfbinae (talk | contribs)
Line 25:
==Lasso regularization==
 
Consider the [[TikhonovRegularization_(mathematics)|regularized]] regularization[[Empirical_risk_minimization|empirical risk minimization]] problem with square loss and with the <math>\ell_1</math> norm as the regularization penalty:
:<math>\min_{w\in\mathbb{R}^n} \frac{1}{n}\sum_{i=1}^n (y_i- \langle w,x_i\rangle)^2+ \lambda \|w\|_1, </math>
where <math>x_i\in \mathbb{R}^d\text{ and } y_i\in\mathbb{R}.</math> The <math>\ell_1</math> regularization penalty is sometimes referred to as <i>lasso</i> ([[Least_squares#Lasso_method|least absolute shrinkage and selection operator]]).<ref name=tibshirani /> Such <math>\ell_1</math> regularization problems are interesting because they induce <i> sparse</i> solutions, that is, solutions <math>w</math> to the minimization problem have relatively few nonzero components. Lasso can be seen to be a convex relaxation of the non-convex problem