Proximal gradient methods for learning: Difference between revisions

Content deleted Content added
m replace/remove deprecated cs1|2 parameters; using AWB
Line 68:
\end{cases}</math>
 
which is known as the [[Thresholding (image processing)|soft thresholding]] operator <math>S_{\gamma}(x)=\operatorname{prox}_{\gamma \|\cdot\|_1}(x)</math>.<ref name=combettes /><ref name=daubechies>{{cite journal|last=Daubechies|first=I. |coauthorsauthor2=Defrise, M., and |author3=De Mol, C.|title=An iterative thresholding algorithm for linear inverse problem with a sparsity constraint|journal=Comm. Pure Appl. Math|year=2004|volume=57|issue=11|pages=1413–1457|doi=10.1002/cpa.20042}}</ref>
 
=== Fixed point iterative schemes ===
Line 97:
:<math>\min_{w\in\mathbb{R}^d} \frac{1}{n}\sum_{i=1}^n (y_i- \langle w,x_i\rangle)^2+ \lambda \left((1-\mu)\|w\|_1+\mu \|w\|_2^2\right), </math>
where <math>x_i\in \mathbb{R}^d\text{ and } y_i\in\mathbb{R}.</math>
For <math>0<\mu\leq 1</math> the penalty term <math>\lambda \left((1-\mu)\|w\|_1+\mu \|w\|_2^2\right)</math> is now strictly convex, and hence the minimization problem now admits a unique solution. It has been observed that for sufficiently small <math>\mu > 0</math>, the additional penalty term <math>\mu \|w\|_2^2</math> acts as a preconditioner and can substantially improve convergence while not adversely affecting the sparsity of solutions.<ref name=structSparse /><ref name=deMolElasticNet>{{cite journal|last=De Mol|first=C. |coauthorsauthor2=De Vito, E., and |author3=Rosasco, L.|title=Elastic-net regularization in learning theory|journal=J. Complexity|year=2009|volume=25|issue=2|pages=201–230|doi=10.1016/j.jco.2009.01.002}}</ref>
 
== Exploiting group structure ==