Proximal gradient method: Difference between revisions

Content deleted Content added
m clean up, added underlinked tag using AWB
Rambor12 (talk | contribs)
mNo edit summary
 
(90 intermediate revisions by 54 users not shown)
Line 1:
{{Short description|Form of projection}}
{{Underlinked|date=June 2013}}
{{more footnotes|date=November 2013}}
 
'''Proximal gradient methods''' are a generalized form of projection used to solve non-differentiable [[convex optimization]] problems.
[[Convex optimization]] is a sub field of optimization which can produce reliable solutions
[[File:Frank_Wolfe_vs_Projected_Gradient.webm|thumb|A comparison between the iterates of the projected gradient method (in red) and the [[Frank–Wolfe algorithm|Frank-Wolfe method]] (in green).]]
and can be solved exactly. Many signal processing problems can be formulated as convex
Many interesting problems can be formulated as convex optimization problems of the form
 
:<math> \begin{align}
& \operatorname{minimize}_{x \in \mathbb{R}^N} & & f_1(x) +f_2(x) + ... + f_{n-1}(x) +f_n(x)
\end{align}</math>
 
where <math>f_1, f_2, ..., f_n </math> are convex functions defined from <math>f: \mathbb{R}^N \rightarrow \mathbb{R} </math>
where some of the functions are non-differentiable, this rules out our conventional smooth optimization techniques like
[http://en.wikipedia.org/wiki/Gradient_descent Steepest decent method], [http://en.wikipedia.org/wiki/Conjugate_gradient_method conjugate gradient method] etc. There is a specific class of algorithms which can solve above optimization problem. These methods proceed by splitting,
in that the functions <math>f_1, . . . , f_n</math> are used individually so as to yield an easily implementable algorithm.
They are called proximal because each non smooth function among <math>f_1, . . . , f_n</math> is involved via its proximity
operator. Iterative Shrinkage thresholding algorithm, [http://en.wikipedia.org/wiki/Landweber_iteration projected Landweber], projected
gradient, [http://en.wikipedia.org/wiki/Alternating_projection alternating projections], [http://en.wikipedia.org/wiki/Alternating_direction_method_of_multipliers#Alternating_direction_method_of_multipliers alternating-direction method of multipliers], alternating
split Bregman are special instances of proximal algorithms. Details of proximal methods are discussed in <ref>
{{cite news |last1=Combettes |first1=Patrick L. |last2= Pesquet |first2=Jean-Chritophe |title=Proximal Splitting Methods in Signal Processing|page={11–23} |year=2009 |url=http://arxiv.org/abs/0912.3522}}</ref>
 
== Notations and Terminology ==
Let <math>\mathbb{R}^N</math> is the <math>N</math>-dimensional euclidean space and ___domain of function
<math> f: \mathbb{R}^N \rightarrow ]-\infty,+\infty]</math>. Suppose <math>C</math> be the non-empty
convex subset of <math>\mathbb{R}^N</math>. The indicator function of <math>C</math> is defined as
 
<math> i_C : x \mapsto \left\lbrace
\begin{align}
& 0 & \mbox{if } x \in C\\
& + \infty & \mbox{if } x \notin C
\end{align} \right.
</math>
 
<math>p</math>-norm is defined as ( <math>\parallel . \parallel_{p}</math> )
 
<math>
\min_{\mathbf{x} \in \mathbb{R}^d} \sum_{i=1}^n f_i(\mathbf{x})
\parallel x \parallel_p = ( |x|^p + |x|^p + |x|^p +... |x|^p )^{\frac{1}{p}}
</math>
 
where <math>f_i: \mathbb{R}^d \rightarrow \mathbb{R},\ i = 1, \dots, n</math> are possibly non-differentiable [[convex functions]]. The lack of differentiability rules out conventional smooth optimization techniques like the [[Gradient descent|steepest descent method]] and the [[conjugate gradient method]], but proximal gradient methods can be used instead.
The distance from <math>x \in \mathbb{R}^N</math> to <math>C</math> is defined as
 
Proximal gradient methods starts by a splitting step, in which the functions <math>f_1, . . . , f_n</math> are used individually so as to yield an easily [[wikt:implementable|implementable]] algorithm. They are called [[proximal]] because each non-differentiable function among <math>f_1, . . . , f_n</math> is involved via its [[Proximal operator|proximity operator]]. Iterative shrinkage thresholding algorithm,<ref>
<math>
{{cite journal | last1=Daubechies | first1=I | last2=Defrise | first2 = M | last3 = De Mol| first3 = C|author3-link= Christine De Mol | title=An iterative thresholding algorithm for linear inverse problems with a sparsity constraint |journal= Communications on Pure and Applied Mathematics|volume=57 | issue=11 |year=2004|pages=1413–1457| bibcode=2003math......7152D | arxiv=math/0307152 |doi=10.1002/cpa.20042}}</ref> [[Landweber iteration|projected Landweber]], projected gradient, [[alternating projection]]s, [[Alternating direction method of multipliers#Alternating direction method of multipliers|alternating-direction method of multipliers]], alternating
D_C(x) = min_{y \in C} \parallel x - y \parallel
split [[Bregman method|Bregman]] are special instances of proximal algorithms.<ref>Details of proximal methods are discussed in {{cite arXiv |last1=Combettes |first1=Patrick L. |last2= Pesquet |first2=Jean-Christophe |title=Proximal Splitting Methods in Signal Processing|page=|year=2009 |eprint=0912.3522|class=math.OC }}</ref>
</math>
 
For the theory of proximal gradient methods from the perspective of and with applications to [[statistical learning theory]], see [[proximal gradient methods for learning]].
If <math>C</math> is closed and convex, the projection of <math>x \in \mathbb{R}^N</math> onto <math>C</math> is the unique point
<math>P_Cx \in C</math> such that <math>D_C(x) = \parallel x - P_Cx \parallel_2 </math>.
 
== Projection onto convex sets (POCS) ==
[http://en.wikipedia.org/wiki/Subdifferential Sub differential] of <math>f</math> is given by
 
One of the widely used convex optimization algorithms is [[projections onto convex sets]] (POCS). This algorithm is employed to recover/synthesize a signal satisfying simultaneously several convex constraints. Let <math>f_i</math> be the indicator function of non-empty closed convex set <math>C_i</math> modeling a constraint. This reduces to convex feasibility problem, which require us to find a solution such that it lies in the intersection of all convex sets <math>C_i</math>. In POCS method each set <math>C_i</math> is incorporated by its [[projection operator]] <math>P_{C_i}</math>. So in each [[iteration]] <math>x</math> is updated as
<math>
: <math>
\partial f : \mathbb{R}^N \rightarrow 2^{\mathbb{R}^N} : x \mapsto { u \in \mathbb{R}^N | (\forall y \in \mathbb{R}^N) (y-x)^Tu+f(x) \leq f(y))}
x_{k+1} = P_{C_1} P_{C_2} \cdots P_{C_n} x_k
</math>
 
== Proximal Gradient methods ==
 
One of the widely used convex optimization algorithm is POCS ([http://en.wikipedia.org/wiki/Projections_onto_convex_sets Projection Onto Convex Sets]).
This algorithm is employed to recover/synthesize a signal satisfying simultaneously several convex constraints.
Let <math>f_i</math> be the indicator function of non-empty closed convex set <math>C_i</math> modeling a constraint.
This reduces to convex feasibility problem, which require us to find a solution such that it lies in the intersection
of all convex sets <math>C_i</math>. In POCS method each set <math>C_i</math> is incorporated by its projection operator
<math>P_{C_i}</math>. So in each iteration <math>x</math> is updated as
 
<math>
x_{k+1} = P_{C_1} P_{C_2} ... P_{C_n}x_k
</math>
 
However beyond such problems Projection operators are not appropriate and more general operators
are required to tackle them. Among the various generalizations of the notion of a convex projection
operator that exist, proximity operators are best suited for other purposes.
 
== Definition ==
Proximity operators of function <math>f</math> at <math>x</math> is defined as
 
For every <math>x \in \mathbb{R}^N </math>, the minimization problem,
 
<math>
\begin{align}
& minimize_{y \in C} & f(y) + \frac{1}{2} \parallel x-y \parallel_2^2 &
\end{align}
</math>
admits a unique solution which is denoted by <math>prox_f(x)</math>.
 
<math>
prox_f(x) :\mathbb{R}^N \rightarrow \mathbb{R}^N
</math>
 
The proximity operator of <math>f</math> is characterized by inclusion
 
<math>
\begin{align}
& p=prox_f(x) \Leftrightarrow x-p \in \partial f(p) & (\forall(x,p) \in \mathbb{R}^N \times \mathbb{R}^N)
\end{align}
</math>
 
If <math>f</math> is differentiable then above equation reduces to
 
<math>
\begin{align}
& p=prox_f(x) \Leftrightarrow x-p \in \triangledown f(p) & (\forall(x,p) \in \mathbb{R}^N \times \mathbb{R}^N)
\end{align}
</math>
However beyond such problems [[projection operator]]s are not appropriate and more general operators are required to tackle them. Among the various generalizations of the notion of a convex projection operator that exist, proximal operators are best suited for other purposes.
 
== Examples ==
Special instances of Proximal Gradient Methods are
*[[Landweber iteration|Projected Landweber]]
*[http://en.wikipedia.org/wiki/Basis_pursuit Basis Pursuit]
*[[Alternating projection]]
*[http://en.wikipedia.org/wiki/Landweber_iteration Projected Landweber]
*[[Alternating direction method of multipliers#Alternating direction method of multipliers|Alternating-direction method of multipliers]]
*[http://en.wikipedia.org/wiki/Alternating_projection Alternating Projection ]
*[http://en.wikipedia.org/wiki/Alternating_direction_method_of_multipliers#Alternating_direction_method_of_multipliers Alternating-direction method of multipliers]
*Fast Iterative Shrinkage Thresholding Algorithm (FISTA)<ref>
{{cite news | last1="Beck | first1=A | last2=Teboulle | first2 = M | title=A fast iterative shrinkage-thresholding algorithm for linear inverse problems
|journal=SIAM J. Imaging Science|volume=2 |year=2009|pages=183–202}}</ref>
 
== See also ==
* [[Proximal operator]]
* [http://en.wikipedia.org/wiki/Alternating_projection Alternating Projection]
* [[Proximal gradient methods for learning]]
* [http://en.wikipedia.org/wiki/Convex_optimization Convex Optimization]
* [[Frank–Wolfe algorithm]]
 
== Notes ==
{{reflist}}
 
== References ==
Line 127 ⟶ 52:
*{{ cite book
| last1 = Combettes
| first1 = Patrick L.
| last2 = Pesquet
| first2 = Jean-Christophe
| title = Springer's Fixed-Point Algorithms for Inverse Problems in Science and Engineering
| volume = 49
| year = 2011
| pages = 185–212
}}
 
== Notes ==
<references/>
 
==External links==
* Stephen Boyd and Lieven Vandenberghe Book , [httphttps://wwwweb.stanford.edu/~boyd/cvxbook/ ''Convex optimization'']
* [httphttps://wwwweb.stanford.edu/class/ee364a/ EE364a: Convex Optimization I] and [httphttps://wwwweb.stanford.edu/class/ee364b/ EE364b: Convex Optimization II], Stanford course homepages
* [httphttps://wwwpeople.eecs.berkeley.edu/~elghaoui/Teaching/EE227A/lecture18.pdf‎pdf EE227A: Lieven Vandenberghe Notes] Lecture 18
* [https://github.com/kul-forbes/ProximalOperators.jl ProximalOperators.jl]: a [[Julia (programming language)|Julia]] package implementing proximal operators.
* [https://github.com/kul-forbes/ProximalAlgorithms.jl ProximalAlgorithms.jl]: a [[Julia (programming language)|Julia]] package implementing algorithms based on the proximal operator, including the proximal gradient method.
* [http://proximity-operator.net/ Proximity Operator repository]: a collection of proximity operators implemented in [[Matlab]] and [[Python (programming language)|Python]].
 
[[Category:Gradient methods]]
{{Uncategorized|date=June 2013}}