Proximal gradient methods for learning: Difference between revisions

Content deleted Content added
Monkbot (talk | contribs)
m Journal cites, Added 4 dois to journal cites using AWB (10365)
Line 2:
:<math>\min_{w\in\mathbb{R}^d} \frac{1}{n}\sum_{i=1}^n (y_i- \langle w,x_i\rangle)^2+ \lambda \|w\|_1, \quad \text{ where } x_i\in \mathbb{R}^d\text{ and } y_i\in\mathbb{R}.</math>
 
Proximal gradient methods offer a general framework for solving regularization problems from statistical learning theory with penalties that are tailored to a specific problem application.<ref name=combettes>{{cite journal|last=Combettes|first=Patrick L.|author2=Wajs, Valérie R. |title=Signal Recovering by Proximal Forward-Backward Splitting|journal=Multiscale Model. Simul.|year=2005|volume=4|issue=4|pages=1168–1200|url=http://epubs.siam.org/doi/abs/10.1137/050626090|doi=10.1137/050626090}}</ref><ref name=structSparse>{{cite journal|last=Mosci|first=S.|author2=Rosasco, L. |author3=Matteo, S. |author4=Verri, A. |author5=Villa, S. |title=Solving Structured Sparsity Regularization with Proximal Methods|journal=Machine Learning and Knowledge Discovery in Databases|year=2010|volume=6322|pages=418–433 |doi=10.1007/978-3-642-15883-4_27}}</ref> Such customized penalties can help to induce certain structure in problem solutions, such as ''sparsity'' (in the case of [[#Lasso regularization|lasso]]) or ''group structure'' (in the case of [[#Exploiting group structure|group lasso]]).
 
== Relevant background ==
Line 68:
\end{cases}</math>
 
which is known as the [[Thresholding (image processing)|soft thresholding]] operator <math>S_{\gamma}(x)=\operatorname{prox}_{\gamma \|\cdot\|_1}(x)</math>.<ref name=combettes /><ref name=daubechies>{{cite journal|last=Daubechies|first=I.|coauthors=Defrise, M., and De Mol, C.|title=An iterative thresholding algorithm for linear inverse problem with a sparsity constraint|journal=Comm. Pure Appl. Math|year=2004|volume=57|issue=11|pages=1413–1457|doi=10.1002/cpa.20042}}</ref>
 
=== Fixed point iterative schemes ===
Line 90:
In the fixed point iteration scheme
:<math>w^{k+1} = \operatorname{prox}_{\gamma R}\left(w^k-\gamma \nabla F\left(w^k\right)\right),</math>
one can allow variable step size <math>\gamma_k</math> instead of a constant <math>\gamma</math>. Numerous adaptive step size schemes have been proposed throughout the literature.<ref name=combettes /><ref name=bauschke /><ref>{{cite journal|last=Loris|first=I.|coauthors=Bertero, M., De Mol, C., Zanella, R., and Zanni, L.|title=Accelerating gradient projection methods for <math>\ell_1</math>-constrained signal recovery by steplength selection rules|journal=Applied & Comp. Harmonic Analysis|volume=27|issue=2|pages=247–254|year=2009|doi=10.1016/j.acha.2009.02.003}}</ref><ref>{{cite journal|last=Wright|first=S.J.|author2=Nowak, R.D. |author3=Figueiredo, M.A.T. |title=Sparse reconstruction by separable approximation|journal=IEEE Trans. Image Process.|year=2009|volume=57|issue=7|pages=2479–2493|doi=10.1109/TSP.2009.2016892}}</ref> Applications of these schemes<ref name=structSparse /><ref>{{cite journal|last=Loris|first=Ignace|title=On the performance of algorithms for the minimization of <math>\ell_1</math>-penalized functionals|journal=Inverse Problems|year=2009|volume=25|issue=3|doi=10.1088/0266-5611/25/3/035008}}</ref> suggest that these can offer substantial improvement in number of iterations required for fixed point convergence.
 
=== Elastic net (mixed norm regularization) ===