Proximal gradient method

Proximal gradient methods are a generalized form of projection used to solve non-differentiable convex optimization problems. Many interesting problems can be formulated as convex optimization problems of form

\operatorname {min} \limits _{x\in \mathbb {R} ^{N}}\sum _{i=1}^{n}f_{i}(x)

where $f_{i},\ i=1,\dots ,n$ are convex functions defined from $f:\mathbb {R} ^{N}\rightarrow \mathbb {R}$ where some of the functions are non-differentiable, this rules out our conventional smooth optimization techniques like Steepest descent method, conjugate gradient method etc. There is a specific class of algorithms which can solve the above optimization problem. These methods proceed by splitting, in that the functions $f_{1},...,f_{n}$ are used individually so as to yield an easily implementable algorithm. They are called proximal because each non smooth function among $f_{1},...,f_{n}$ is involved via its proximity operator. Iterative Shrinkage thresholding algorithm, projected Landweber, projected gradient, alternating projections, alternating-direction method of multipliers, alternating split Bregman are special instances of proximal algorithms. Details of proximal methods are discussed in Combettes and Pesquet.^[1] For the theory of proximal gradient methods from the perspective of and with applications to statistical learning theory, see proximal gradient methods for learning.

Notations and terminology

Let $\mathbb {R} ^{N}$ , the $N$ -dimensional euclidean space, be the ___domain of the function $f:\mathbb {R} ^{N}\rightarrow (-\infty ,+\infty ]$ . Suppose $C$ is a non-empty convex subset of $\mathbb {R} ^{N}$ . Then, the indicator function of $C$ is defined as

\iota _{C}:x\mapsto {\begin{cases}0&{\text{if }}x\in C\\+\infty &{\text{if }}x\notin C\end{cases}}

p

-norm is defined as (

\|\cdot \|_{p}

)

\|x\|_{p}=(|x_{1}|^{p}+|x_{2}|^{p}+\cdots +|x_{N}|^{p})^{1/p}

The distance from $x\in \mathbb {R} ^{N}$ to $C$ is defined as

D_{C}(x)=\min _{y\in C}\|x-y\|_{2}

If $C$ is closed and convex, the projection of $x\in \mathbb {R} ^{N}$ onto $C$ is the unique point $P_{C}x\in C$ such that $D_{C}(x)=\|x-P_{C}x\|_{2}$ .

The subdifferential of $f$ is given by

\partial f=\{u\in \mathbb {R} ^{N}\mid \forall y\in \mathbb {R} ^{N},(y-x)^{\mathrm {T} }u+f(x)\leq f(y).\}

Projection onto convex sets (POCS)

One of the widely used convex optimization algorithms is POCS (Projection Onto Convex Sets). This algorithm is employed to recover/synthesize a signal satisfying simultaneously several convex constraints. Let $f_{i}$ be the indicator function of non-empty closed convex set $C_{i}$ modeling a constraint. This reduces to convex feasibility problem, which require us to find a solution such that it lies in the intersection of all convex sets $C_{i}$ . In POCS method each set $C_{i}$ is incorporated by its projection operator $P_{C_{i}}$ . So in each iteration $x$ is updated as

x_{k+1}=P_{C_{1}}P_{C_{2}}\cdots P_{C_{n}}x_{k}

However beyond such problems projection operators are not appropriate and more general operators are required to tackle them. Among the various generalizations of the notion of a convex projection operator that exist, proximity operators are best suited for other purposes.

Definition

The proximity operator of a convex function $f$ at $x$ is defined as the unique solution to

\operatorname {argmin} \limits _{y}{\bigg (}f(y)+{\frac {1}{2}}\left\|x-y\right\|_{2}^{2}{\bigg )}

and is denoted $\operatorname {prox} _{f}(x)$ .

\operatorname {prox} _{f}(x):\mathbb {R} ^{N}\rightarrow \mathbb {R} ^{N}

Note that in the specific case where $f$ is the indicator function $\iota _{C}$ of some convex set $C$

{\begin{aligned}\operatorname {prox} _{\iota _{C}}(x)&=\operatorname {argmin} \limits _{y}{\begin{cases}{\frac {1}{2}}\left\|x-y\right\|_{2}^{2}&{\text{if }}y\in C\\+\infty &{\text{if }}y\notin C\end{cases}}\\&=\operatorname {argmin} \limits _{y\in C}{\frac {1}{2}}\left\|x-y\right\|_{2}^{2}\\&=P_{C}(x)\end{aligned}}

showing that the proximity operator is indeed a generalisation of the projection operator.

The proximity operator of $f$ is characterized by inclusion

p=\operatorname {prox} _{f}(x)\Leftrightarrow x-p\in \partial f(p)\qquad (\forall (x,p)\in \mathbb {R} ^{N}\times \mathbb {R} ^{N})

If $f$ is differentiable then above equation reduces to

p=\operatorname {prox} _{f}(x)\Leftrightarrow x-p=\nabla f(p)\quad (\forall (x,p)\in \mathbb {R} ^{N}\times \mathbb {R} ^{N})

Examples

Special instances of Proximal Gradient Methods are

Projected Landweber
Alternating projection
Alternating-direction method of multipliers
Fast Iterative Shrinkage Thresholding Algorithm (FISTA)^[2]

Notes

^ Combettes, Patrick L.; Pesquet, Jean-Christophe (2009). "Proximal Splitting Methods in Signal Processing". arXiv:0912.3522.{{cite arXiv}}: CS1 maint: missing class (link) A bot will complete this citation soon. Click here to jump the queue
^ "Beck, A; Teboulle, M (2009). "A fast iterative shrinkage-thresholding algorithm for linear inverse problems". SIAM J. Imaging Science. Vol. 2. pp. 183–202.

References

Rockafellar, R. T. (1970). Convex analysis. Princeton: Princeton University Press.
Combettes, Patrick L.; Pesquet, Jean-Christophe (2011). Springer's Fixed-Point Algorithms for Inverse Problems in Science and Engineering. Vol. 49. pp. 185–212.

External links

Stephen Boyd and Lieven Vandenberghe Book, Convex optimization
EE364a: Convex Optimization I and EE364b: Convex Optimization II, Stanford course homepages
EE227A: Lieven Vandenberghe Notes Lecture 18
ProximalOperators.jl: a Julia package implementing proximal operators.
ProximalAlgorithms.jl: a Julia package implementing algorithms based on the proximal operator, including the proximal gradient method.
Proximity Operator repository: a collection of proximity operators implemented in Matlab and Python.

[1] Combettes, Patrick L.; Pesquet, Jean-Christophe (2009). "Proximal Splitting Methods in Signal Processing". arXiv:0912.3522.{{cite arXiv}}: CS1 maint: missing class (link) A bot will complete this citation soon. Click here to jump the queue

[2] "Beck, A; Teboulle, M (2009). "A fast iterative shrinkage-thresholding algorithm for linear inverse problems". SIAM J. Imaging Science. Vol. 2. pp. 183–202.

[1]

[2]