Proximal gradient methods for learning: Difference between revisions

Content deleted Content added
Mgfbinae (talk | contribs)
No edit summary
Mgfbinae (talk | contribs)
No edit summary
Line 1:
{{user sandbox}}
 
'''Proximal gradient methods for learning''' is an area of research in [[statistical learning theory]] which studies algorithms for a general class of [[Tikhonov Regularization_(mathematics)|regularization]] problems where the regularization penalty may not be [[Convex_function#Definition|strictly convex]]. One such example is <math>\ell_1</math> regularization (also known as Lasso) of the form
:<math>\min_{w\in\mathbb{R}^n} \frac{1}{n}\sum_{i=1}^n (y_i- \langle w,x_i\rangle)^2+ \lambda \|w\|_1, \quad \text{ where } x_i\in \mathbb{R}^d\text{ and } y_i\in\mathbb{R}.</math>
 
Line 85:
===Adaptive step size===
 
Consider a problem of the form <math>\min_w F(w) + R(w),</math> where <math>F(w)</math> is convex and differentiable and <math>R(w)</math> is convex (for example, for differentiable [[Loss_function|loss functions]] Tikhonovempirical loss minimization with regularization takes this form, one such example of which is considered with lasso regularization). This gives rise to the fixed point iteration scheme
:<math>w^{k+1} = \operatorname{prox}_{\gamma R}\left(w^k-\gamma \nabla F\left(w^k\right)\right).</math>
One standard modification is to let <math>\gamma</math> vary with <math>k</math>, hence we have the scheme