Proximal gradient methods for learning: Difference between revisions

Content deleted Content added
T9520 (talk | contribs)
m Corrected links to Lasso page.
m clean up, replaced: Ann. Statist. → Ann. Stat. using AWB
Line 2:
:<math>\min_{w\in\mathbb{R}^d} \frac{1}{n}\sum_{i=1}^n (y_i- \langle w,x_i\rangle)^2+ \lambda \|w\|_1, \quad \text{ where } x_i\in \mathbb{R}^d\text{ and } y_i\in\mathbb{R}.</math>
 
Proximal gradient methods offer a general framework for solving regularization problems from statistical learning theory with penalties that are tailored to a specific problem application.<ref name=combettes>{{cite journal|last=Combettes|first=Patrick L.|author2=Wajs, Valérie R. |title=Signal Recovering by Proximal Forward-Backward Splitting|journal=Multiscale Model. Simul.|year=2005|volume=4|issue=4|pages=1168–1200|url=http://epubs.siam.org/doi/abs/10.1137/050626090|doi=10.1137/050626090}}</ref><ref name=structSparse>{{cite journal|last=Mosci|first=S.|author2=Rosasco, L. |author3=Matteo, S. |author4=Verri, A. |author5=Villa, S. |title=Solving Structured Sparsity Regularization with Proximal Methods|journal=Machine Learning and Knowledge Discovery in Databases|year=2010|volume=6322|pages=418–433 |doi=10.1007/978-3-642-15883-4_27}}</ref> Such customized penalties can help to induce certain structure in problem solutions, such as ''sparsity'' (in the case of [[Lasso (statistics)|lasso]]) or ''group structure'' (in the case of [[Lasso_Lasso (statistics)#Group_LASSOGroup LASSO|group lasso]]).
 
== Relevant background ==
Line 26:
which for <math>\gamma=1</math> implies that <math>x = \operatorname{prox}_{\varphi}(x)+\operatorname{prox}_{\varphi^*}(x)</math>.<ref name=combettes /><ref name=moreau>{{cite journal|last=Moreau|first=J.-J.|title=Fonctions convexes duales et points proximaux dans un espace hilbertien|journal=C. R. Acad. Sci. Paris Ser. A Math.|year=1962|volume=255|pages=2987-2899}}</ref> The Moreau decomposition can be seen to be a generalization of the usual orthogonal decomposition of a vector space, analogous with the fact that proximity operators are generalizations of projections.<ref name=combettes />
 
In certain situations it may be easier to compute the proximity operator for the conjugate <math>\varphi^*</math> instead of the function <math>\varphi</math>, and therefore the Moreau decomposition can be applied. This is the case for [[Lasso_Lasso (statistics)#Group_LASSOGroup LASSO|group lasso]].
 
== Lasso regularization ==
Line 122:
=== Other group structures ===
 
In contrast to the group lasso problem, where features are grouped into disjoint blocks, it may be the case that grouped features are overlapping or have a nested structure. Such generalizations of group lasso have been considered in a variety of contexts.<ref>{{cite journal|last=Chen|first=X.|author2=Lin, Q. |author3=Kim, S. |author4=Carbonell, J.G. |author5=Xing, E.P. |title=Smoothing proximal gradient method for general structured sparse regression|journal=Ann. Appl. Stat.|year=2012|volume=6|issue=2|pages=719–752|doi=10.1214/11-AOAS514}}</ref><ref>{{cite journal|last=Mosci|first=S.|author2=Villa, S. |author3=Verri, A. |author4=Rosasco, L. |title=A primal-dual algorithm for group sparse regularization with overlapping groups|journal=NIPS|year=2010|volume=23|pages=2604–2612}}</ref><ref name=nest>{{cite journal|last=Jenatton|first=R.|coauthors=Audibert, J.-Y., and Bach, F.|title=Structured variable selection with sparsity-inducing norms|journal=J. Mach. Learn. Res.|year=2011|volume=12|pages=2777–2824}}</ref><ref>{{cite journal|last=Zhao|first=P.|author2=Rocha, G. |author3=Yu, B. |title=The composite absolute penalties family for grouped and hierarchical variable selection|journal=Ann. StatistStat.|year=2009|volume=37|issue=6A|pages=3468–3497|doi=10.1214/07-AOS584}}</ref> For overlapping groups one common approach is known as ''latent group lasso'' which introduces latent variables to account for overlap.<ref>{{cite journal|last=Obozinski|first=G.|coauthors=Laurent, J., and Vert, J.-P.|title=Group lasso with overlaps: the latent group lasso approach|journal=INRIA Technical Report|year=2011|url=http://hal.inria.fr/inria-00628498/en/}}</ref><ref>{{cite journal|last=Villa|first=S.|author2=Rosasco, L. |author3=Mosci, S. |author4=Verri, A. |title=Proximal methods for the latent group lasso penalty|journal=preprint|year=2012|url=http://arxiv.org/abs/1209.0368}}</ref> Nested group structures are studied in ''hierarchical structure prediction'' and with [[directed acyclic graph]]s.<ref name=nest />
 
== See also ==