Revision as of 17:43, 12 October 2014 edit Dithridge (talk \| contribs) Extended confirmed users 640 edits No edit summary ← Previous edit		Revision as of 18:27, 16 December 2014 edit undo Michael Hardy (talk \| contribs) Administrators 210,597 edits copy-editing per WP:MOS and WP:MOSMATH Next edit →
Line 3: '''Proximal gradient methods''' are a generalized form of projection used to solve non-differentiable [[convex optimization]] problems. Many interesting problems can be formulated as convex optimization problems of form :<math> ~~\begin{align}~~ & \operatorname{minimize}_{x \in \mathbb{R}^N} ~~& &~~\qquad f_1(x) +f_2(x) + ~~...~~ \cdots+ f_{n-1}(x) +f_n(x) ~~\end{align}~~</math> where <math>f_1, f_2, ..., f_n </math> are [[convex functions]] defined from <math>f: \mathbb{R}^N \rightarrow \mathbb{R} </math> Line 17: {{cite news \|last1=Combettes \|first1=Patrick L. \|last2= Pesquet \|first2=Jean-Christophe \|title=Proximal Splitting Methods in Signal Processing\|page={11–23} \|year=2009 \|url=http://arxiv.org/abs/0912.3522}}</ref> For the theory of proximal gradient methods from the perspective of and with applications to [[statistical learning theory]], see [[proximal gradient methods for learning]]. == Notations and ~~Terminology~~terminology == Let <math>\mathbb{R}^N</math>, the <math>N</math>-dimensional [[euclidean space]], be the ___domain of the function <math> f: \mathbb{R}^N \rightarrow [-\infty,+\infty]</math>. Suppose <math>C</math> is the non-empty convex subset of <math>\mathbb{R}^N</math>. Then, the indicator function of <math>C</math> is defined as : <math> i_C : x \mapsto ~~\left\lbrace~~ \begin{~~align~~cases} & 0 & \~~mbox~~text{if } x \in C \\ & + \infty & \~~mbox~~text{if } x \notin C \end{~~align~~cases} ~~\right.~~ </math> : <math>p</math>-norm is defined as ( <math>\\| .\cdot \\|~~_{p}~~_p</math> ) : <math> \\|x\\|_p = ( \|x_1\|^p + \|x_2\|^p + \~~dots~~cdots + \|x_N\|^p )^~~{\frac~~{1}{/p}} </math> The distance from <math>x \in \mathbb{R}^N</math> to <math>C</math> is defined as : <math> D_C(x) = \min_{y \in C} \\|x - y\\| </math> Line 46: The [[subdifferential]] of <math>f</math> is given by : <math> \partial f : \mathbb{R}^N \rightarrow 2^{\mathbb{R}^N} : x \mapsto \{ u \in \mathbb{R}^N \|\mid \forall y \in \mathbb{R}^N, (y-x)^\mathrm{T}u+f(x) \leq f(y)).\} </math> == Projection onto ~~Convex~~convex ~~Sets~~sets (POCS) == One of the widely used convex optimization algorithms is POCS ([[Projections onto convex sets\|Projection Onto Convex Sets]]). Line 59: <math>P_{C_i}</math>. So in each [[iteration]] <math>x</math> is updated as : <math> x_{k+1} = P_{C_1} P_{C_2} ~~...~~\cdots P_{C_n} x_k </math> Line 72: For every <math>x \in \mathbb{R}^N </math>, the minimization problem :<math> & \text{minimize}_{y \in C} & \qquad f(y) + \frac{1}{2} \left\\| x-y \right\\|_2^2 &▼ ~~\begin{align}~~ ▲ & \text{minimize}_{y \in C} & f(y) + \frac{1}{2} \left\\| x-y \right\\|_2^2 & ~~\end{align}~~ </math> admits a unique solution which is denoted by <math>\operatorname{prox}_f(x)</math>. : <math> \operatorname{prox}_f(x) :\mathbb{R}^N \rightarrow \mathbb{R}^N </math> Line 84 ⟶ 82: The proximity operator of <math>f</math> is characterized by inclusion : <math> & p=\operatorname{prox}_f(x) \Leftrightarrow x-p \in \partial f(p) &\qquad (\forall(x,p) \in \mathbb{R}^N \times \mathbb{R}^N) ▼ ~~\begin{align}~~ ▲& p=\operatorname{prox}_f(x) \Leftrightarrow x-p \in \partial f(p) & (\forall(x,p) \in \mathbb{R}^N \times \mathbb{R}^N) ~~\end{align}~~ </math> If <math>f</math> is differentiable then above equation reduces to : <math> & p=\operatorname{prox}_f(x) \Leftrightarrow x-p \in \nabla f(p) &\quad (\forall(x,p) \in \mathbb{R}^N \times \mathbb{R}^N)▼ ~~\begin{align}~~ ▲& p=\operatorname{prox}_f(x) \Leftrightarrow x-p \in \nabla f(p) & (\forall(x,p) \in \mathbb{R}^N \times \mathbb{R}^N) ~~\end{align}~~ </math> Line 101 ⟶ 95: Special instances of Proximal Gradient Methods are [[Landweber iteration\|Projected Landweber]] [[Alternating projection~~\|Alternating Projection~~]] [[Alternating direction method of multipliers#Alternating direction method of multipliers\|Alternating-direction method of multipliers]] Fast Iterative Shrinkage Thresholding Algorithm (FISTA)<ref> Line 108 ⟶ 102: == See also == * [[Alternating projection~~\|Alternating Projection~~]] * [[Convex optimization~~\|Convex Optimization~~]] * [[Frank–Wolfe algorithm]] * [[Proximal gradient methods for learning]]

Proximal gradient method: Difference between revisions