Proximal gradient methods for learning: Difference between revisions

Content deleted Content added
Citation bot (talk | contribs)
Alter: template type. Add: s2cid. | Use this bot. Report bugs. | Suggested by AManWithNoPlan | #UCB_webform 628/1682
Sadopaul (talk | contribs)
Line 36:
where <math>\|w\|_0</math> denotes the <math>\ell_0</math> "norm", which is the number of nonzero entries of the vector <math>w</math>. Sparse solutions are of particular interest in learning theory for interpretability of results: a sparse solution can identify a small number of important factors.<ref name=tibshirani>{{cite journal|last=Tibshirani|first=R.|title=Regression shrinkage and selection via the lasso|journal=J. R. Stat. Soc. Ser. B|year=1996|volume=58|series=1|issue=1|pages=267–288}}</ref>
 
=== Solving for L<mathsub>\ell_11</mathsub> proximity operator ===
 
For simplicity we restrict our attention to the problem where <math>\lambda=1</math>. To solve the problem