Content deleted Content added
No edit summary |
copy-editing per WP:MOS and WP:MOSMATH |
||
Line 3:
'''Proximal gradient methods''' are a generalized form of projection used to solve non-differentiable [[convex optimization]] problems. Many interesting problems can be formulated as convex optimization problems of form
:<math>
where <math>f_1, f_2, ..., f_n </math> are [[convex functions]] defined from <math>f: \mathbb{R}^N \rightarrow \mathbb{R} </math>
Line 17:
{{cite news |last1=Combettes |first1=Patrick L. |last2= Pesquet |first2=Jean-Christophe |title=Proximal Splitting Methods in Signal Processing|page={11–23} |year=2009 |url=http://arxiv.org/abs/0912.3522}}</ref> For the theory of proximal gradient methods from the perspective of and with applications to [[statistical learning theory]], see [[proximal gradient methods for learning]].
== Notations and
Let <math>\mathbb{R}^N</math>, the <math>N</math>-dimensional [[euclidean space]], be the ___domain of the function
<math> f: \mathbb{R}^N \rightarrow [-\infty,+\infty]</math>. Suppose <math>C</math> is the non-empty
convex subset of <math>\mathbb{R}^N</math>. Then, the indicator function of <math>C</math> is defined as
: <math> i_C : x \mapsto
\begin{
\end{
</math>
: <math>p</math>-norm is defined as ( <math>\|
: <math>
\|x\|_p = ( |x_1|^p + |x_2|^p + \
</math>
The distance from <math>x \in \mathbb{R}^N</math> to <math>C</math> is defined as
: <math>
D_C(x) = \min_{y \in C} \|x - y\|
</math>
Line 46:
The [[subdifferential]] of <math>f</math> is given by
: <math>
\partial f : \mathbb{R}^N \rightarrow 2^{\mathbb{R}^N} : x \mapsto \{ u \in \mathbb{R}^N
</math>
== Projection onto
One of the widely used convex optimization algorithms is POCS ([[Projections onto convex sets|Projection Onto Convex Sets]]).
Line 59:
<math>P_{C_i}</math>. So in each [[iteration]] <math>x</math> is updated as
: <math>
x_{k+1} = P_{C_1} P_{C_2}
</math>
Line 72:
For every <math>x \in \mathbb{R}^N </math>, the minimization problem
:<math>
▲ & \text{minimize}_{y \in C} & f(y) + \frac{1}{2} \left\| x-y \right\|_2^2 &
</math>
admits a unique solution which is denoted by <math>\operatorname{prox}_f(x)</math>.
: <math>
\operatorname{prox}_f(x) :\mathbb{R}^N \rightarrow \mathbb{R}^N
</math>
Line 84 ⟶ 82:
The proximity operator of <math>f</math> is characterized by inclusion
: <math>
▲& p=\operatorname{prox}_f(x) \Leftrightarrow x-p \in \partial f(p) & (\forall(x,p) \in \mathbb{R}^N \times \mathbb{R}^N)
</math>
If <math>f</math> is differentiable then above equation reduces to
: <math>
▲& p=\operatorname{prox}_f(x) \Leftrightarrow x-p \in \nabla f(p) & (\forall(x,p) \in \mathbb{R}^N \times \mathbb{R}^N)
</math>
Line 101 ⟶ 95:
Special instances of Proximal Gradient Methods are
*[[Landweber iteration|Projected Landweber]]
*[[Alternating projection
*[[Alternating direction method of multipliers#Alternating direction method of multipliers|Alternating-direction method of multipliers]]
*Fast Iterative Shrinkage Thresholding Algorithm (FISTA)<ref>
Line 108 ⟶ 102:
== See also ==
* [[Alternating projection
* [[Convex optimization
* [[Frank–Wolfe algorithm]]
* [[Proximal gradient methods for learning]]
|