Revision as of 15:16, 4 September 2019 edit PedjaNbg (talk \| contribs) Extended confirmed users 9,060 edits Added free to read link in citations with OAbot #oabot Tag: OAbot [1.0] ← Previous edit		Revision as of 15:12, 24 November 2019 edit undo Headbomb (talk \| contribs) Edit filter managers, Autopatrolled, Extended confirmed users, Page movers, File movers, New page reviewers, Pending changes reviewers, Rollbackers, Template editors 472,979 edits Alter: url, journal. Add: arxiv, url, bibcode, isbn, series. \| You can use this tool yourself. Report bugs here. \| via #UCB_Gadget Next edit →
Line 2: :<math>\min_{w\in\mathbb{R}^d} \frac{1}{n}\sum_{i=1}^n (y_i- \langle w,x_i\rangle)^2+ \lambda \\|w\\|_1, \quad \text{ where } x_i\in \mathbb{R}^d\text{ and } y_i\in\mathbb{R}.</math> Proximal gradient methods offer a general framework for solving regularization problems from statistical learning theory with penalties that are tailored to a specific problem application.<ref name=combettes>{{cite journal\|last=Combettes\|first=Patrick L.\|author2=Wajs, Valérie R. \|title=Signal Recovering by Proximal Forward-Backward Splitting\|journal=Multiscale Model. Simul.\|year=2005\|volume=4\|issue=4\|pages=1168–1200\|~~url~~doi=~~http://epubs.siam.org/doi/abs/~~10.1137/050626090\|~~doi~~url=10https://semanticscholar.~~1137~~org/~~050626090~~paper/56974187b4d9a8757f4d8a6fd6facc8b4ad08240}}</ref><ref name=structSparse>{{cite journal\|last=Mosci\|first=S.\|author2=Rosasco, L. \|author3=Matteo, S. \|author4=Verri, A. \|author5=Villa, S. \|title=Solving Structured Sparsity Regularization with Proximal Methods\|journal=Machine Learning and Knowledge Discovery in Databases\|year=2010\|volume=6322\|pages=418–433 \|doi=10.1007/978-3-642-15883-4_27\|series=Lecture Notes in Computer Science\|isbn=978-3-642-15882-7}}</ref> Such customized penalties can help to induce certain structure in problem solutions, such as ''sparsity'' (in the case of [[Lasso (statistics)\|lasso]]) or ''group structure'' (in the case of [[Lasso (statistics)#Group LASSO\|group lasso]]). == Relevant background == Line 84: == Practical considerations == There have been numerous developments within the past decade in [[convex optimization]] techniques which have influenced the application of proximal gradient methods in statistical learning theory. Here we survey a few important topics which can greatly improve practical algorithmic performance of these methods.<ref name=structSparse /><ref name=bach>{{cite journal\|last=Bach\|first=F.\|author2=Jenatton, R. \|author3=Mairal, J. \|author4=Obozinski, Gl. \|title=Optimization with sparsity-inducing penalties\|journal=~~Found.~~ &Foundations and Trends ~~Mach.~~in ~~Learn.~~Machine Learning\|year=2011\|volume=4\|issue=1\|pages=1–106\|doi=10.1561/2200000015\|arxiv=1108.0775\|bibcode=2011arXiv1108.0775B}}</ref> === Adaptive step size === Line 105: === Group lasso === Group lasso is a generalization of the [[Lasso (statistics)\|lasso method]] when features are grouped into disjoint blocks.<ref name=groupLasso>{{cite journal\|last=Yuan\|first=M.\|author2=Lin, Y. \|title=Model selection and estimation in regression with grouped variables\|journal=J. R. Stat. Soc. B\|year=2006\|volume=68\|issue=1\|pages=49–67\|doi=10.1111/j.1467-9868.2005.00532.x\|url=https://semanticscholar.org/paper/d98ef875e2cbde3e2cc8fad521e3cbfe1bddbd69}}</ref> Suppose the features are grouped into blocks <math>\{w_1,\ldots,w_G\}</math>. Here we take as a regularization penalty :<math>R(w) =\sum_{g=1}^G \\|w_g\\|_2,</math> Line 122: === Other group structures === In contrast to the group lasso problem, where features are grouped into disjoint blocks, it may be the case that grouped features are overlapping or have a nested structure. Such generalizations of group lasso have been considered in a variety of contexts.<ref>{{cite journal\|last=Chen\|first=X.\|author2=Lin, Q. \|author3=Kim, S. \|author4=Carbonell, J.G. \|author5=Xing, E.P. \|title=Smoothing proximal gradient method for general structured sparse regression\|journal=Ann. Appl. Stat.\|year=2012\|volume=6\|issue=2\|pages=719–752\|doi=10.1214/11-AOAS514\|arxiv=1005.4717}}</ref><ref>{{cite journal\|last=Mosci\|first=S.\|author2=Villa, S. \|author3=Verri, A. \|author4=Rosasco, L. \|title=A primal-dual algorithm for group sparse regularization with overlapping groups\|journal=NIPS\|year=2010\|volume=23\|pages=2604–2612}}</ref><ref name=nest>{{cite journal\|last=Jenatton\|first=R. \|author2=Audibert, J.-Y. \|author3=Bach, F. \|title=Structured variable selection with sparsity-inducing norms\|journal=J. Mach. Learn. Res.\|year=2011\|volume=12\|pages=2777–2824\|bibcode=2009arXiv0904.3523J \|arxiv=0904.3523 }}</ref><ref>{{cite journal\|last=Zhao\|first=P.\|author2=Rocha, G. \|author3=Yu, B. \|title=The composite absolute penalties family for grouped and hierarchical variable selection\|journal=Ann. Stat.\|year=2009\|volume=37\|issue=6A\|pages=3468–3497\|doi=10.1214/07-AOS584\|arxiv=0909.0411\|bibcode=2009arXiv0909.0411Z}}</ref> For overlapping groups one common approach is known as ''latent group lasso'' which introduces latent variables to account for overlap.<ref>{{cite journal\|last=Obozinski\|first=G. \|author2=Laurent, J. \|author3=Vert, J.-P. \|title=Group lasso with overlaps: the latent group lasso approach\|journal=INRIA Technical Report\|year=2011\|url=http://hal.inria.fr/inria-00628498/en/\|bibcode=2011arXiv1110.0413O \|arxiv=1110.0413 }}</ref><ref>{{cite journal\|last=Villa\|first=S.\|author2=Rosasco, L. \|author3=Mosci, S. \|author4=Verri, A. \|title=Proximal methods for the latent group lasso penalty\|journal=~~preprint~~Preprint\|year=2012\|arxiv=1209.0368\|bibcode=2012arXiv1209.0368V}}</ref> Nested group structures are studied in ''hierarchical structure prediction'' and with [[directed acyclic graph]]s.<ref name=nest /> == See also ==

Proximal gradient methods for learning: Difference between revisions