Proximal gradient method: Difference between revisions

Content deleted Content added
No edit summary
copy-editing per WP:MOS and WP:MOSMATH
Line 3:
'''Proximal gradient methods''' are a generalized form of projection used to solve non-differentiable [[convex optimization]] problems. Many interesting problems can be formulated as convex optimization problems of form
 
:<math> \begin{align}
& \operatorname{minimize}_{x \in \mathbb{R}^N} & &\qquad f_1(x) +f_2(x) + ... \cdots+ f_{n-1}(x) +f_n(x)
\end{align}</math>
 
where <math>f_1, f_2, ..., f_n </math> are [[convex functions]] defined from <math>f: \mathbb{R}^N \rightarrow \mathbb{R} </math>
Line 17:
{{cite news |last1=Combettes |first1=Patrick L. |last2= Pesquet |first2=Jean-Christophe |title=Proximal Splitting Methods in Signal Processing|page={11–23} |year=2009 |url=http://arxiv.org/abs/0912.3522}}</ref> For the theory of proximal gradient methods from the perspective of and with applications to [[statistical learning theory]], see [[proximal gradient methods for learning]].
 
== Notations and Terminologyterminology ==
Let <math>\mathbb{R}^N</math>, the <math>N</math>-dimensional [[euclidean space]], be the ___domain of the function
<math> f: \mathbb{R}^N \rightarrow [-\infty,+\infty]</math>. Suppose <math>C</math> is the non-empty
convex subset of <math>\mathbb{R}^N</math>. Then, the indicator function of <math>C</math> is defined as
 
: <math> i_C : x \mapsto \left\lbrace
\begin{aligncases}
& 0 & \mboxtext{if } x \in C \\
& + \infty & \mboxtext{if } x \notin C
\end{aligncases} \right.
</math>
 
: <math>p</math>-norm is defined as ( <math>\| .\cdot \|_{p}_p</math> )
 
: <math>
\|x\|_p = ( |x_1|^p + |x_2|^p + \dotscdots + |x_N|^p )^{\frac{1}{/p}}
</math>
 
The distance from <math>x \in \mathbb{R}^N</math> to <math>C</math> is defined as
 
: <math>
D_C(x) = \min_{y \in C} \|x - y\|
</math>
Line 46:
The [[subdifferential]] of <math>f</math> is given by
 
: <math>
\partial f : \mathbb{R}^N \rightarrow 2^{\mathbb{R}^N} : x \mapsto \{ u \in \mathbb{R}^N |\mid \forall y \in \mathbb{R}^N, (y-x)^\mathrm{T}u+f(x) \leq f(y)).\}
</math>
 
== Projection onto Convexconvex Setssets (POCS) ==
 
One of the widely used convex optimization algorithms is POCS ([[Projections onto convex sets|Projection Onto Convex Sets]]).
Line 59:
<math>P_{C_i}</math>. So in each [[iteration]] <math>x</math> is updated as
 
: <math>
x_{k+1} = P_{C_1} P_{C_2} ...\cdots P_{C_n} x_k
</math>
 
Line 72:
For every <math>x \in \mathbb{R}^N </math>, the minimization problem
:<math>
& \text{minimize}_{y \in C} & \qquad f(y) + \frac{1}{2} \left\| x-y \right\|_2^2 &
\begin{align}
& \text{minimize}_{y \in C} & f(y) + \frac{1}{2} \left\| x-y \right\|_2^2 &
\end{align}
</math>
admits a unique solution which is denoted by <math>\operatorname{prox}_f(x)</math>.
 
: <math>
\operatorname{prox}_f(x) :\mathbb{R}^N \rightarrow \mathbb{R}^N
</math>
Line 84 ⟶ 82:
The proximity operator of <math>f</math> is characterized by inclusion
 
: <math>
& p=\operatorname{prox}_f(x) \Leftrightarrow x-p \in \partial f(p) &\qquad (\forall(x,p) \in \mathbb{R}^N \times \mathbb{R}^N)
\begin{align}
& p=\operatorname{prox}_f(x) \Leftrightarrow x-p \in \partial f(p) & (\forall(x,p) \in \mathbb{R}^N \times \mathbb{R}^N)
\end{align}
</math>
 
If <math>f</math> is differentiable then above equation reduces to
 
: <math>
& p=\operatorname{prox}_f(x) \Leftrightarrow x-p \in \nabla f(p) &\quad (\forall(x,p) \in \mathbb{R}^N \times \mathbb{R}^N)
\begin{align}
& p=\operatorname{prox}_f(x) \Leftrightarrow x-p \in \nabla f(p) & (\forall(x,p) \in \mathbb{R}^N \times \mathbb{R}^N)
\end{align}
</math>
 
Line 101 ⟶ 95:
Special instances of Proximal Gradient Methods are
*[[Landweber iteration|Projected Landweber]]
*[[Alternating projection|Alternating Projection]]
*[[Alternating direction method of multipliers#Alternating direction method of multipliers|Alternating-direction method of multipliers]]
*Fast Iterative Shrinkage Thresholding Algorithm (FISTA)<ref>
Line 108 ⟶ 102:
 
== See also ==
* [[Alternating projection|Alternating Projection]]
* [[Convex optimization|Convex Optimization]]
* [[Frank–Wolfe algorithm]]
* [[Proximal gradient methods for learning]]