Content deleted Content added
No edit summary |
No edit summary |
||
Line 1:
{{user sandbox}}
'''Proximal gradient methods for learning''' is an area of research in [[statistical learning theory]] which studies algorithms for a general class of [[
:<math>\min_{w\in\mathbb{R}^n} \frac{1}{n}\sum_{i=1}^n (y_i- \langle w,x_i\rangle)^2+ \lambda \|w\|_1, \quad \text{ where } x_i\in \mathbb{R}^d\text{ and } y_i\in\mathbb{R}.</math>
Line 85:
===Adaptive step size===
Consider a problem of the form <math>\min_w F(w) + R(w),</math> where <math>F(w)</math> is convex and differentiable and <math>R(w)</math> is convex (for example, for differentiable [[Loss_function|loss functions]]
:<math>w^{k+1} = \operatorname{prox}_{\gamma R}\left(w^k-\gamma \nabla F\left(w^k\right)\right).</math>
One standard modification is to let <math>\gamma</math> vary with <math>k</math>, hence we have the scheme
|