Revision as of 13:40, 17 December 2015 edit T9520 (talk \| contribs) 22 edits m Corrected links to Lasso page. ← Previous edit		Revision as of 15:01, 20 February 2016 edit undo Headbomb (talk \| contribs) Edit filter managers, Autopatrolled, Extended confirmed users, Page movers, File movers, New page reviewers, Pending changes reviewers, Rollbackers, Template editors 472,964 edits m clean up, replaced: Ann. Statist. → Ann. Stat. using AWB Next edit →
Line 2: :<math>\min_{w\in\mathbb{R}^d} \frac{1}{n}\sum_{i=1}^n (y_i- \langle w,x_i\rangle)^2+ \lambda \\|w\\|_1, \quad \text{ where } x_i\in \mathbb{R}^d\text{ and } y_i\in\mathbb{R}.</math> Proximal gradient methods offer a general framework for solving regularization problems from statistical learning theory with penalties that are tailored to a specific problem application.<ref name=combettes>{{cite journal\|last=Combettes\|first=Patrick L.\|author2=Wajs, Valérie R. \|title=Signal Recovering by Proximal Forward-Backward Splitting\|journal=Multiscale Model. Simul.\|year=2005\|volume=4\|issue=4\|pages=1168–1200\|url=http://epubs.siam.org/doi/abs/10.1137/050626090\|doi=10.1137/050626090}}</ref><ref name=structSparse>{{cite journal\|last=Mosci\|first=S.\|author2=Rosasco, L. \|author3=Matteo, S. \|author4=Verri, A. \|author5=Villa, S. \|title=Solving Structured Sparsity Regularization with Proximal Methods\|journal=Machine Learning and Knowledge Discovery in Databases\|year=2010\|volume=6322\|pages=418–433 \|doi=10.1007/978-3-642-15883-4_27}}</ref> Such customized penalties can help to induce certain structure in problem solutions, such as ''sparsity'' (in the case of [[Lasso (statistics)\|lasso]]) or ''group structure'' (in the case of [[~~Lasso_~~Lasso (statistics)#~~Group_LASSO~~Group LASSO\|group lasso]]). == Relevant background == Line 26: which for <math>\gamma=1</math> implies that <math>x = \operatorname{prox}_{\varphi}(x)+\operatorname{prox}_{\varphi^}(x)</math>.<ref name=combettes /><ref name=moreau>{{cite journal\|last=Moreau\|first=J.-J.\|title=Fonctions convexes duales et points proximaux dans un espace hilbertien\|journal=C. R. Acad. Sci. Paris Ser. A Math.\|year=1962\|volume=255\|pages=2987-2899}}</ref> The Moreau decomposition can be seen to be a generalization of the usual orthogonal decomposition of a vector space, analogous with the fact that proximity operators are generalizations of projections.<ref name=combettes /> In certain situations it may be easier to compute the proximity operator for the conjugate <math>\varphi^</math> instead of the function <math>\varphi</math>, and therefore the Moreau decomposition can be applied. This is the case for [[~~Lasso_~~Lasso (statistics)#~~Group_LASSO~~Group LASSO\|group lasso]]. == Lasso regularization == Line 122: === Other group structures === In contrast to the group lasso problem, where features are grouped into disjoint blocks, it may be the case that grouped features are overlapping or have a nested structure. Such generalizations of group lasso have been considered in a variety of contexts.<ref>{{cite journal\|last=Chen\|first=X.\|author2=Lin, Q. \|author3=Kim, S. \|author4=Carbonell, J.G. \|author5=Xing, E.P. \|title=Smoothing proximal gradient method for general structured sparse regression\|journal=Ann. Appl. Stat.\|year=2012\|volume=6\|issue=2\|pages=719–752\|doi=10.1214/11-AOAS514}}</ref><ref>{{cite journal\|last=Mosci\|first=S.\|author2=Villa, S. \|author3=Verri, A. \|author4=Rosasco, L. \|title=A primal-dual algorithm for group sparse regularization with overlapping groups\|journal=NIPS\|year=2010\|volume=23\|pages=2604–2612}}</ref><ref name=nest>{{cite journal\|last=Jenatton\|first=R.\|coauthors=Audibert, J.-Y., and Bach, F.\|title=Structured variable selection with sparsity-inducing norms\|journal=J. Mach. Learn. Res.\|year=2011\|volume=12\|pages=2777–2824}}</ref><ref>{{cite journal\|last=Zhao\|first=P.\|author2=Rocha, G. \|author3=Yu, B. \|title=The composite absolute penalties family for grouped and hierarchical variable selection\|journal=Ann. ~~Statist~~Stat.\|year=2009\|volume=37\|issue=6A\|pages=3468–3497\|doi=10.1214/07-AOS584}}</ref> For overlapping groups one common approach is known as ''latent group lasso'' which introduces latent variables to account for overlap.<ref>{{cite journal\|last=Obozinski\|first=G.\|coauthors=Laurent, J., and Vert, J.-P.\|title=Group lasso with overlaps: the latent group lasso approach\|journal=INRIA Technical Report\|year=2011\|url=http://hal.inria.fr/inria-00628498/en/}}</ref><ref>{{cite journal\|last=Villa\|first=S.\|author2=Rosasco, L. \|author3=Mosci, S. \|author4=Verri, A. \|title=Proximal methods for the latent group lasso penalty\|journal=preprint\|year=2012\|url=http://arxiv.org/abs/1209.0368}}</ref> Nested group structures are studied in ''hierarchical structure prediction'' and with [[directed acyclic graph]]s.<ref name=nest /> == See also ==

Proximal gradient methods for learning: Difference between revisions