Revision as of 17:38, 25 July 2014 edit Monkbot (talk \| contribs) Bots 3,695,952 edits m Task 4: Fix CS1 deprecated coauthor parameter errors ← Previous edit		Revision as of 08:07, 16 August 2014 edit undo Rjwilmsi (talk \| contribs) Extended confirmed users, Pending changes reviewers, Rollbackers 933,584 edits m Journal cites, Added 4 dois to journal cites using AWB (10365) Next edit →
Line 2: :<math>\min_{w\in\mathbb{R}^d} \frac{1}{n}\sum_{i=1}^n (y_i- \langle w,x_i\rangle)^2+ \lambda \\|w\\|_1, \quad \text{ where } x_i\in \mathbb{R}^d\text{ and } y_i\in\mathbb{R}.</math> Proximal gradient methods offer a general framework for solving regularization problems from statistical learning theory with penalties that are tailored to a specific problem application.<ref name=combettes>{{cite journal\|last=Combettes\|first=Patrick L.\|author2=Wajs, Valérie R. \|title=Signal Recovering by Proximal Forward-Backward Splitting\|journal=Multiscale Model. Simul.\|year=2005\|volume=4\|issue=4\|pages=1168–1200\|url=http://epubs.siam.org/doi/abs/10.1137/050626090\|doi=10.1137/050626090}}</ref><ref name=structSparse>{{cite journal\|last=Mosci\|first=S.\|author2=Rosasco, L. \|author3=Matteo, S. \|author4=Verri, A. \|author5=Villa, S. \|title=Solving Structured Sparsity Regularization with Proximal Methods\|journal=Machine Learning and Knowledge Discovery in Databases\|year=2010\|volume=6322\|pages=418–433 \|doi=10.1007/978-3-642-15883-4_27}}</ref> Such customized penalties can help to induce certain structure in problem solutions, such as ''sparsity'' (in the case of [[#Lasso regularization\|lasso]]) or ''group structure'' (in the case of [[#Exploiting group structure\|group lasso]]). == Relevant background == Line 68: \end{cases}</math> which is known as the [[Thresholding (image processing)\|soft thresholding]] operator <math>S_{\gamma}(x)=\operatorname{prox}_{\gamma \\|\cdot\\|_1}(x)</math>.<ref name=combettes /><ref name=daubechies>{{cite journal\|last=Daubechies\|first=I.\|coauthors=Defrise, M., and De Mol, C.\|title=An iterative thresholding algorithm for linear inverse problem with a sparsity constraint\|journal=Comm. Pure Appl. Math\|year=2004\|volume=57\|issue=11\|pages=1413–1457\|doi=10.1002/cpa.20042}}</ref> === Fixed point iterative schemes === Line 90: In the fixed point iteration scheme :<math>w^{k+1} = \operatorname{prox}_{\gamma R}\left(w^k-\gamma \nabla F\left(w^k\right)\right),</math> one can allow variable step size <math>\gamma_k</math> instead of a constant <math>\gamma</math>. Numerous adaptive step size schemes have been proposed throughout the literature.<ref name=combettes /><ref name=bauschke /><ref>{{cite journal\|last=Loris\|first=I.\|coauthors=Bertero, M., De Mol, C., Zanella, R., and Zanni, L.\|title=Accelerating gradient projection methods for <math>\ell_1</math>-constrained signal recovery by steplength selection rules\|journal=Applied & Comp. Harmonic Analysis\|volume=27\|issue=2\|pages=247–254\|year=2009\|doi=10.1016/j.acha.2009.02.003}}</ref><ref>{{cite journal\|last=Wright\|first=S.J.\|author2=Nowak, R.D. \|author3=Figueiredo, M.A.T. \|title=Sparse reconstruction by separable approximation\|journal=IEEE Trans. Image Process.\|year=2009\|volume=57\|issue=7\|pages=2479–2493\|doi=10.1109/TSP.2009.2016892}}</ref> Applications of these schemes<ref name=structSparse /><ref>{{cite journal\|last=Loris\|first=Ignace\|title=On the performance of algorithms for the minimization of <math>\ell_1</math>-penalized functionals\|journal=Inverse Problems\|year=2009\|volume=25\|issue=3\|doi=10.1088/0266-5611/25/3/035008}}</ref> suggest that these can offer substantial improvement in number of iterations required for fixed point convergence. === Elastic net (mixed norm regularization) ===

Proximal gradient methods for learning: Difference between revisions