Proximal gradient method: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 12:44, 30 June 2013 edit Josve05a (talk \| contribs) Autopatrolled, Extended confirmed users, New page reviewers, Pending changes reviewers, Rollbackers 157,551 edits m clean up, added underlinked tag using AWB ← Previous edit		Latest revision as of 12:28, 21 June 2025 edit undo Rambor12 (talk \| contribs) 105 edits mNo edit summary Tag: 2017 wikitext editor
(90 intermediate revisions by 54 users not shown)
Line 1: {{Short description\|Form of projection}} ~~{{Underlinked\|date=June 2013}}~~ {{more footnotes\|date=November 2013}} '''Proximal gradient methods''' are a generalized form of projection used to solve non-differentiable [[convex optimization]] problems. ~~[[Convex optimization]] is a sub field of optimization which can produce reliable solutions~~ [[File:Frank_Wolfe_vs_Projected_Gradient.webm\|thumb\|A comparison between the iterates of the projected gradient method (in red) and the [[Frank–Wolfe algorithm\|Frank-Wolfe method]] (in green).]] ~~and can be solved exactly. Many signal processing problems can be formulated as convex~~ Many interesting problems can be formulated as convex optimization problems of the form ~~:<math> \begin{align}~~ ~~& \operatorname{minimize}_{x \in \mathbb{R}^N} & & f_1(x) +f_2(x) + ... + f_{n-1}(x) +f_n(x)~~ ~~\end{align}</math>~~ ~~where <math>f_1, f_2, ..., f_n </math> are convex functions defined from <math>f: \mathbb{R}^N \rightarrow \mathbb{R} </math>~~ ~~where some of the functions are non-differentiable, this rules out our conventional smooth optimization techniques like~~ [http://en.wikipedia.org/wiki/Gradient_descent Steepest decent method], [http://en.wikipedia.org/wiki/Conjugate_gradient_method conjugate gradient method] etc. There is a specific class of algorithms which can solve above optimization problem. These methods proceed by splitting, ~~in that the functions <math>f_1, . . . , f_n</math> are used individually so as to yield an easily implementable algorithm.~~ ~~They are called proximal because each non smooth function among <math>f_1, . . . , f_n</math> is involved via its proximity~~ ~~operator. Iterative Shrinkage thresholding algorithm, [http://en.wikipedia.org/wiki/Landweber_iteration projected Landweber], projected~~ gradient, [http://en.wikipedia.org/wiki/Alternating_projection alternating projections], [http://en.wikipedia.org/wiki/Alternating_direction_method_of_multipliers#Alternating_direction_method_of_multipliers alternating-direction method of multipliers], alternating ~~split Bregman are special instances of proximal algorithms. Details of proximal methods are discussed in <ref>~~ {{cite news \|last1=Combettes \|first1=Patrick L. \|last2= Pesquet \|first2=Jean-Chritophe \|title=Proximal Splitting Methods in Signal Processing\|page={11–23} \|year=2009 \|url=http://arxiv.org/abs/0912.3522}}</ref> ~~== Notations and Terminology ==~~ ~~Let <math>\mathbb{R}^N</math> is the <math>N</math>-dimensional euclidean space and ___domain of function~~ ~~<math> f: \mathbb{R}^N \rightarrow ]-\infty,+\infty]</math>. Suppose <math>C</math> be the non-empty~~ ~~convex subset of <math>\mathbb{R}^N</math>. The indicator function of <math>C</math> is defined as~~ ~~<math> i_C : x \mapsto \left\lbrace~~ ~~\begin{align}~~ ~~& 0 & \mbox{if } x \in C\\~~ ~~& + \infty & \mbox{if } x \notin C~~ ~~\end{align} \right.~~ ~~</math>~~ ~~<math>p</math>-norm is defined as ( <math>\parallel . \parallel_{p}</math> )~~ <math> \min_{\mathbf{x} \in \mathbb{R}^d} \sum_{i=1}^n f_i(\mathbf{x}) ~~\parallel x \parallel_p = ( \|x\|^p + \|x\|^p + \|x\|^p +... \|x\|^p )^{\frac{1}{p}}~~ </math> where <math>f_i: \mathbb{R}^d \rightarrow \mathbb{R},\ i = 1, \dots, n</math> are possibly non-differentiable [[convex functions]]. The lack of differentiability rules out conventional smooth optimization techniques like the [[Gradient descent\|steepest descent method]] and the [[conjugate gradient method]], but proximal gradient methods can be used instead. ~~The distance from <math>x \in \mathbb{R}^N</math> to <math>C</math> is defined as~~ Proximal gradient methods starts by a splitting step, in which the functions <math>f_1, . . . , f_n</math> are used individually so as to yield an easily [[wikt:implementable\|implementable]] algorithm. They are called [[proximal]] because each non-differentiable function among <math>f_1, . . . , f_n</math> is involved via its [[Proximal operator\|proximity operator]]. Iterative shrinkage thresholding algorithm,<ref> ~~<math>~~ {{cite journal \| last1=Daubechies \| first1=I \| last2=Defrise \| first2 = M \| last3 = De Mol\| first3 = C\|author3-link= Christine De Mol \| title=An iterative thresholding algorithm for linear inverse problems with a sparsity constraint \|journal= Communications on Pure and Applied Mathematics\|volume=57 \| issue=11 \|year=2004\|pages=1413–1457\| bibcode=2003math......7152D \| arxiv=math/0307152 \|doi=10.1002/cpa.20042}}</ref> [[Landweber iteration\|projected Landweber]], projected gradient, [[alternating projection]]s, [[Alternating direction method of multipliers#Alternating direction method of multipliers\|alternating-direction method of multipliers]], alternating ~~D_C(x) = min_{y \in C} \parallel x - y \parallel~~ split [[Bregman method\|Bregman]] are special instances of proximal algorithms.<ref>Details of proximal methods are discussed in {{cite arXiv \|last1=Combettes \|first1=Patrick L. \|last2= Pesquet \|first2=Jean-Christophe \|title=Proximal Splitting Methods in Signal Processing\|page=\|year=2009 \|eprint=0912.3522\|class=math.OC }}</ref> ~~</math>~~ For the theory of proximal gradient methods from the perspective of and with applications to [[statistical learning theory]], see [[proximal gradient methods for learning]]. ~~If <math>C</math> is closed and convex, the projection of <math>x \in \mathbb{R}^N</math> onto <math>C</math> is the unique point~~ ~~<math>P_Cx \in C</math> such that <math>D_C(x) = \parallel x - P_Cx \parallel_2 </math>.~~ == Projection onto convex sets (POCS) == ~~[http://en.wikipedia.org/wiki/Subdifferential Sub differential] of <math>f</math> is given by~~ One of the widely used convex optimization algorithms is [[projections onto convex sets]] (POCS). This algorithm is employed to recover/synthesize a signal satisfying simultaneously several convex constraints. Let <math>f_i</math> be the indicator function of non-empty closed convex set <math>C_i</math> modeling a constraint. This reduces to convex feasibility problem, which require us to find a solution such that it lies in the intersection of all convex sets <math>C_i</math>. In POCS method each set <math>C_i</math> is incorporated by its [[projection operator]] <math>P_{C_i}</math>. So in each [[iteration]] <math>x</math> is updated as ~~<math>~~ : <math> ~~\partial f : \mathbb{R}^N \rightarrow 2^{\mathbb{R}^N} : x \mapsto { u \in \mathbb{R}^N \| (\forall y \in \mathbb{R}^N) (y-x)^Tu+f(x) \leq f(y))}~~ x_{k+1} = P_{C_1} P_{C_2} \cdots P_{C_n} x_k ~~</math>~~ ~~== Proximal Gradient methods ==~~ ~~One of the widely used convex optimization algorithm is POCS ([http://en.wikipedia.org/wiki/Projections_onto_convex_sets Projection Onto Convex Sets]).~~ ~~This algorithm is employed to recover/synthesize a signal satisfying simultaneously several convex constraints.~~ ~~Let <math>f_i</math> be the indicator function of non-empty closed convex set <math>C_i</math> modeling a constraint.~~ ~~This reduces to convex feasibility problem, which require us to find a solution such that it lies in the intersection~~ ~~of all convex sets <math>C_i</math>. In POCS method each set <math>C_i</math> is incorporated by its projection operator~~ ~~<math>P_{C_i}</math>. So in each iteration <math>x</math> is updated as~~ ~~<math>~~ ~~x_{k+1} = P_{C_1} P_{C_2} ... P_{C_n}x_k~~ ~~</math>~~ ~~However beyond such problems Projection operators are not appropriate and more general operators~~ ~~are required to tackle them. Among the various generalizations of the notion of a convex projection~~ ~~operator that exist, proximity operators are best suited for other purposes.~~ ~~== Definition ==~~ ~~Proximity operators of function <math>f</math> at <math>x</math> is defined as~~ ~~For every <math>x \in \mathbb{R}^N </math>, the minimization problem,~~ ~~<math>~~ ~~\begin{align}~~ ~~& minimize_{y \in C} & f(y) + \frac{1}{2} \parallel x-y \parallel_2^2 &~~ ~~\end{align}~~ ~~</math>~~ ~~admits a unique solution which is denoted by <math>prox_f(x)</math>.~~ ~~<math>~~ ~~prox_f(x) :\mathbb{R}^N \rightarrow \mathbb{R}^N~~ ~~</math>~~ ~~The proximity operator of <math>f</math> is characterized by inclusion~~ ~~<math>~~ ~~\begin{align}~~ ~~& p=prox_f(x) \Leftrightarrow x-p \in \partial f(p) & (\forall(x,p) \in \mathbb{R}^N \times \mathbb{R}^N)~~ ~~\end{align}~~ ~~</math>~~ ~~If <math>f</math> is differentiable then above equation reduces to~~ ~~<math>~~ ~~\begin{align}~~ ~~& p=prox_f(x) \Leftrightarrow x-p \in \triangledown f(p) & (\forall(x,p) \in \mathbb{R}^N \times \mathbb{R}^N)~~ ~~\end{align}~~ </math> However beyond such problems [[projection operator]]s are not appropriate and more general operators are required to tackle them. Among the various generalizations of the notion of a convex projection operator that exist, proximal operators are best suited for other purposes. == Examples == Special instances of Proximal Gradient Methods are [[Landweber iteration\|Projected Landweber]] [http://en.wikipedia.org/wiki/Basis_pursuit Basis Pursuit] [[Alternating projection]] [http://en.wikipedia.org/wiki/Landweber_iteration Projected Landweber] [[Alternating direction method of multipliers#Alternating direction method of multipliers\|Alternating-direction method of multipliers]] [http://en.wikipedia.org/wiki/Alternating_projection Alternating Projection ] [http://en.wikipedia.org/wiki/Alternating_direction_method_of_multipliers#Alternating_direction_method_of_multipliers Alternating-direction method of multipliers] Fast Iterative Shrinkage Thresholding Algorithm (FISTA)<ref> ~~{{cite news \| last1="Beck \| first1=A \| last2=Teboulle \| first2 = M \| title=A fast iterative shrinkage-thresholding algorithm for linear inverse problems~~ ~~\|journal=SIAM J. Imaging Science\|volume=2 \|year=2009\|pages=183–202}}</ref>~~ == See also == * [[Proximal operator]] * [http://en.wikipedia.org/wiki/Alternating_projection Alternating Projection] * [[Proximal gradient methods for learning]] * [http://en.wikipedia.org/wiki/Convex_optimization Convex Optimization] * [[Frank–Wolfe algorithm]] == Notes == {{reflist}} == References == Line 127 ⟶ 52: {{ cite book \| last1 = Combettes \| first1 = Patrick L. \| last2 = Pesquet \| first2 = Jean-Christophe \| title = ~~Springer's~~ Fixed-Point Algorithms for Inverse Problems in Science and Engineering \| volume = 49 \| year = 2011 \| pages = 185–212 }} ~~== Notes ==~~ ~~<references/>~~ ==External links== Stephen Boyd and Lieven Vandenberghe Book , [~~http~~https://~~www~~web.stanford.edu/~boyd/cvxbook/ ''Convex optimization''] * [~~http~~https://~~www~~web.stanford.edu/class/ee364a/ EE364a: Convex Optimization I] and [~~http~~https://~~www~~web.stanford.edu/class/ee364b/ EE364b: Convex Optimization II], Stanford course homepages * [~~http~~https://~~www~~people.eecs.berkeley.edu/~elghaoui/Teaching/EE227A/lecture18.~~pdf‎~~pdf EE227A: Lieven Vandenberghe Notes] Lecture 18 * [https://github.com/kul-forbes/ProximalOperators.jl ProximalOperators.jl]: a [[Julia (programming language)\|Julia]] package implementing proximal operators. * [https://github.com/kul-forbes/ProximalAlgorithms.jl ProximalAlgorithms.jl]: a [[Julia (programming language)\|Julia]] package implementing algorithms based on the proximal operator, including the proximal gradient method. * [http://proximity-operator.net/ Proximity Operator repository]: a collection of proximity operators implemented in [[Matlab]] and [[Python (programming language)\|Python]]. [[Category:Gradient methods]] ~~{{Uncategorized\|date=June 2013}}~~