Revision as of 03:06, 23 January 2014 edit Mgfbinae (talk \| contribs) 122 edits m Mgfbinae moved page Proximal gradient to Proximal gradient methods for learning: Current title does not reflect full content of article, which is proximal gradient methods specifically from a statistical learning theory perspective ← Previous edit		Revision as of 04:08, 23 January 2014 edit undo 128.101.152.68 (talk) →Exploiting Group Structure: lower case obviously required by WP:MOS Next edit →
Line 91: where <math>x_i\in \mathbb{R}^d\text{ and } y_i\in\mathbb{R}.</math> For <math>0<\mu\leq 1</math> the penalty term <math>\lambda \left((1-\mu)\\|w\\|_1+\mu \\|w\\|_2\right)</math> is now strictly convex, and hence the minimization problem now admits a unique solution. It has been observed that for sufficiently small <math>\mu > 0</math>, the additional penalty term <math>\mu \\|w\\|_2</math> acts as a preconditioner and can substantially improve convergence while not adversely affecting the sparsity of solutions.<ref name=structSparse /><ref name=deMolElasticNet>{{cite journal\|last=De Mol\|first=C.\|coauthors=De Vito, E., and Rosasco, L.\|title=Elastic-net regularization in learning theory\|journal=J. Complexity\|year=2009\|volume=25\|issue=2\|pages=201-230\|doi=10.1016/j.jco.2009.01.002}}</ref> == Exploiting ~~Group~~group ~~Structure~~structure == Proximal gradient methods provide a general framework which is applicable to a wide variety of problems in [[statistical learning theory]]. Certain problems in learning can often involve data which has additional structure that is known '' a priori''. In the past several years there have been new developments which incorporate information about group structure to provide methods which are tailored to different applications. Here we survey a few such methods.

Proximal gradient methods for learning: Difference between revisions