Proximal gradient method: Difference between revisions

Content deleted Content added
Improved flow of the text
Line 9:
</math>
 
where <math>f_i: \mathbb{R}^N \rightarrow \mathbb{R},\ i = 1, \dots, n</math> are possibly non-differentiable [[convex functions]]. The lack of differentiability rules out conventional smooth optimization techniques like the [[Gradient descent|steepest descent method]] and the [[conjugate gradient method]], but proximal gradient methods can be used instead.
where <math>f_i,\ i = 1, \dots, n</math> are [[convex functions]] defined from <math>f: \mathbb{R}^N \rightarrow \mathbb{R} </math>
 
where some of the functions are non-differentiable. This rules out conventional smooth optimization techniques like
[[Gradient descent|Steepest descent method]], [[conjugate gradient method]] etc. Proximal gradient methods can be used instead. These methods proceedstarts by a splitting step, in thatwhich the functions <math>f_1, . . . , f_n</math> are used individually so as to yield an easily [[wikt:implementable|implementable]] algorithm. They are called [[proximal]] because each non-differentiable function among <math>f_1, . . . , f_n</math> is involved via its [[Proximal operator|proximity operator]]. Iterative shrinkage thresholding algorithm,<ref>
{{cite journal | last1=Daubechies | first1=I | last2=Defrise | first2 = M | last3 = De Mol| first3 = C|author3-link= Christine De Mol | title=An iterative thresholding algorithm for linear inverse problems with a sparsity constraint |journal= Communications on Pure and Applied Mathematics|volume=57 | issue=11 |year=2004|pages=1413–1457| bibcode=2003math......7152D | arxiv=math/0307152 |doi=10.1002/cpa.20042}}</ref> [[Landweber iteration|projected Landweber]], projected gradient, [[alternating projection]]s, [[Alternating direction method of multipliers#Alternating direction method of multipliers|alternating-direction method of multipliers]], alternating
They are called [[proximal]] because each non [[smooth function]] among <math>f_1, . . . , f_n</math> is involved via its proximity
split [[Bregman method|Bregman]] are special instances of proximal algorithms.<ref>Details of proximal methods are discussed in {{cite arXiv |last1=Combettes |first1=Patrick L. |last2= Pesquet |first2=Jean-Christophe |title=Proximal Splitting Methods in Signal Processing|page=|year=2009 |eprint=0912.3522|class=math.OC }}</ref> For the theory of proximal gradient methods from the perspective of and with applications to [[statistical learning theory]], see [[proximal gradient methods for learning]].
operator. Iterative Shrinkage thresholding algorithm,<ref>
 
{{cite journal | last1=Daubechies | first1=I | last2=Defrise | first2 = M | last3 = De Mol| first3 = C|author3-link= Christine De Mol | title=An iterative thresholding algorithm for linear inverse problems with a sparsity constraint |journal= Communications on Pure and Applied Mathematics|volume=57 | issue=11 |year=2004|pages=1413–1457| bibcode=2003math......7152D | arxiv=math/0307152 |doi=10.1002/cpa.20042}}</ref> [[Landweber iteration|projected Landweber]], projected
For the theory of proximal gradient methods from the perspective of and with applications to [[statistical learning theory]], see [[proximal gradient methods for learning]].
gradient, [[alternating projection]]s, [[Alternating direction method of multipliers#Alternating direction method of multipliers|alternating-direction method of multipliers]], alternating
split [[Bregman method|Bregman]] are special instances of proximal algorithms.<ref>Details of proximal methods are discussed in {{cite arXiv |last1=Combettes |first1=Patrick L. |last2= Pesquet |first2=Jean-Christophe |title=Proximal Splitting Methods in Signal Processing|page=|year=2009 |eprint=0912.3522|class=math.OC }}</ref> For the theory of proximal gradient methods from the perspective of and with applications to [[statistical learning theory]], see [[proximal gradient methods for learning]].
 
== Notations and terminology ==