Proximal gradient methods for learning: Difference between revisions

Content deleted Content added
OAbot (talk | contribs)
m Open access bot: add arxiv identifier to citation with #oabot.
Bibcode Bot (talk | contribs)
m Adding 1 arxiv eprint(s), 3 bibcode(s) and 0 doi(s). Did it miss something? Report bugs, errors, and suggestions at User talk:Bibcode Bot
Line 90:
In the fixed point iteration scheme
:<math>w^{k+1} = \operatorname{prox}_{\gamma R}\left(w^k-\gamma \nabla F\left(w^k\right)\right),</math>
one can allow variable step size <math>\gamma_k</math> instead of a constant <math>\gamma</math>. Numerous adaptive step size schemes have been proposed throughout the literature.<ref name=combettes /><ref name=bauschke /><ref>{{cite journal|last=Loris|first=I. |author2=Bertero, M. |author3=De Mol, C. |author4=Zanella, R. |author5=Zanni, L. |title=Accelerating gradient projection methods for <math>\ell_1</math>-constrained signal recovery by steplength selection rules|journal=Applied & Comp. Harmonic Analysis|volume=27|issue=2|pages=247–254|year=2009|doi=10.1016/j.acha.2009.02.003}}</ref><ref>{{cite journal|last=Wright|first=S.J.|author2=Nowak, R.D. |author3=Figueiredo, M.A.T. |title=Sparse reconstruction by separable approximation|journal=IEEE Trans. Image Process.|year=2009|volume=57|issue=7|pages=2479–2493|doi=10.1109/TSP.2009.2016892|bibcode=2009ITSP...57.2479W}}</ref> Applications of these schemes<ref name=structSparse /><ref>{{cite journal|last=Loris|first=Ignace|title=On the performance of algorithms for the minimization of <math>\ell_1</math>-penalized functionals|journal=Inverse Problems|year=2009|volume=25|issue=3|doi=10.1088/0266-5611/25/3/035008|page=035008|arxiv=0710.4082|bibcode=2009InvPr..25c5008L}}</ref> suggest that these can offer substantial improvement in number of iterations required for fixed point convergence.
 
=== Elastic net (mixed norm regularization) ===
Line 122:
=== Other group structures ===
 
In contrast to the group lasso problem, where features are grouped into disjoint blocks, it may be the case that grouped features are overlapping or have a nested structure. Such generalizations of group lasso have been considered in a variety of contexts.<ref>{{cite journal|last=Chen|first=X.|author2=Lin, Q. |author3=Kim, S. |author4=Carbonell, J.G. |author5=Xing, E.P. |title=Smoothing proximal gradient method for general structured sparse regression|journal=Ann. Appl. Stat.|year=2012|volume=6|issue=2|pages=719–752|doi=10.1214/11-AOAS514|arxiv=1005.4717}}</ref><ref>{{cite journal|last=Mosci|first=S.|author2=Villa, S. |author3=Verri, A. |author4=Rosasco, L. |title=A primal-dual algorithm for group sparse regularization with overlapping groups|journal=NIPS|year=2010|volume=23|pages=2604–2612}}</ref><ref name=nest>{{cite journal|last=Jenatton|first=R. |author2=Audibert, J.-Y. |author3=Bach, F. |title=Structured variable selection with sparsity-inducing norms|journal=J. Mach. Learn. Res.|year=2011|volume=12|pages=2777–2824}}</ref><ref>{{cite journal|last=Zhao|first=P.|author2=Rocha, G. |author3=Yu, B. |title=The composite absolute penalties family for grouped and hierarchical variable selection|journal=Ann. Stat.|year=2009|volume=37|issue=6A|pages=3468–3497|doi=10.1214/07-AOS584}}</ref> For overlapping groups one common approach is known as ''latent group lasso'' which introduces latent variables to account for overlap.<ref>{{cite journal|last=Obozinski|first=G. |author2=Laurent, J. |author3=Vert, J.-P. |title=Group lasso with overlaps: the latent group lasso approach|journal=INRIA Technical Report|year=2011|url=http://hal.inria.fr/inria-00628498/en/}}</ref><ref>{{cite journal|last=Villa|first=S.|author2=Rosasco, L. |author3=Mosci, S. |author4=Verri, A. |title=Proximal methods for the latent group lasso penalty|journal=preprint|year=2012|arxiv=1209.0368|bibcode=2012arXiv1209.0368V}}</ref> Nested group structures are studied in ''hierarchical structure prediction'' and with [[directed acyclic graph]]s.<ref name=nest />
 
== See also ==