Proximal gradient method: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 11:35, 16 December 2022 edit Saung Tadashi (talk \| contribs) Extended confirmed users 2,083 edits Improved flow of the text Tag: Visual edit ← Previous edit		Latest revision as of 12:28, 21 June 2025 edit undo Rambor12 (talk \| contribs) 105 edits mNo edit summary Tag: 2017 wikitext editor
(8 intermediate revisions by 4 users not shown)
Line 1: {{Short description\|Form of projection}} {{more footnotes\|date=November 2013}} '''Proximal gradient methods''' are a generalized form of projection used to solve non-differentiable [[convex optimization]] problems. [[File:Frank_Wolfe_vs_Projected_Gradient.webm\|thumb\|A comparison between the iterates of the projected gradient method (in red) and the [[Frank–Wolfe algorithm\|Frank-Wolfe method]] (in green).]] Many interesting problems can be formulated as convex optimization problems of the form <math> \~~operatorname~~min_{~~min}~~\~~limits_~~mathbf{x} \in \mathbb{R}^Nd} \sum_{i=1}^n f_i(\mathbf{x}) </math> where <math>f_i: \mathbb{R}^Nd \rightarrow \mathbb{R},\ i = 1, \dots, n</math> are possibly non-differentiable [[convex functions]]. The lack of differentiability rules out conventional smooth optimization techniques like the [[Gradient descent\|steepest descent method]] and the [[conjugate gradient method]], but proximal gradient methods can be used instead. Proximal gradient methods starts by a splitting step, in which the functions <math>f_1, . . . , f_n</math> are used individually so as to yield an easily [[wikt:implementable\|implementable]] algorithm. They are called [[proximal]] because each non-differentiable function among <math>f_1, . . . , f_n</math> is involved via its [[Proximal operator\|proximity operator]]. Iterative shrinkage thresholding algorithm,<ref> Line 16 ⟶ 17: For the theory of proximal gradient methods from the perspective of and with applications to [[statistical learning theory]], see [[proximal gradient methods for learning]]. ~~== Notations and terminology ==~~ ~~Let <math>\mathbb{R}^N</math>, the <math>N</math>-dimensional [[Euclidean space]], be the ___domain of the function~~ ~~<math> f: \mathbb{R}^N \rightarrow (-\infty,+\infty]</math>. Suppose <math>C</math> is a non-empty~~ ~~convex subset of <math>\mathbb{R}^N</math>. Then, the indicator function of <math>C</math> is defined as~~ ~~: <math> \iota_C : x \mapsto~~ ~~\begin{cases}~~ ~~0 & \text{if } x \in C \\~~ ~~+ \infty & \text{if } x \notin C~~ ~~\end{cases}~~ ~~</math>~~ ~~: <math>p</math>-norm is defined as ( <math>\\| \cdot \\|_p</math> )~~ ~~: <math>~~ ~~\\|x\\|_p = ( \|x_1\|^p + \|x_2\|^p + \cdots + \|x_N\|^p )^{1/p}~~ ~~</math>~~ ~~The distance from <math>x \in \mathbb{R}^N</math> to <math>C</math> is defined as~~ ~~: <math>~~ ~~D_C(x) = \min_{y \in C} \\|x - y\\|_2~~ ~~</math>~~ ~~If <math>C</math> is closed and convex, the projection of <math>x \in \mathbb{R}^N</math> onto <math>C</math> is the unique point~~ ~~<math>P_Cx \in C</math> such that <math>D_C(x) = \\| x - P_Cx \\|_2 </math>.~~ ~~The [[subdifferential]] of <math>f</math> at <math>x</math> is given by~~ ~~: <math>~~ ~~\partial f(x) = \{ u \in \mathbb{R}^N \mid \forall y \in \mathbb{R}^N, (y-x)^\mathrm{T}u+f(x) \leq f(y).\}~~ ~~</math>~~ == Projection onto convex sets (POCS) == Line 56 ⟶ 24: x_{k+1} = P_{C_1} P_{C_2} \cdots P_{C_n} x_k </math> However beyond such problems [[projection operator]]s are not appropriate and more general operators are required to tackle them. Among the various generalizations of the notion of a convex projection operator that exist, ~~proximity~~proximal operators are best suited for other purposes. ~~== Definition ==~~ ~~The [[proximal operator\|proximity operator]] of a convex function <math>f</math> at <math>x</math> is defined as the unique solution to~~ ~~:<math>~~ ~~\operatorname{argmin}\limits_y \bigg( f(y) + \frac{1}{2} \left\\| x-y \right\\|_2^2 \bigg)~~ ~~</math>~~ ~~and is denoted <math>\operatorname{prox}_f(x)</math>.~~ ~~: <math>~~ ~~\operatorname{prox}_f(x) :\mathbb{R}^N \rightarrow \mathbb{R}^N~~ ~~</math>~~ ~~Note that in the specific case where <math>f</math> is the indicator function <math>\iota_C</math> of some convex set <math>C</math>~~ ~~: <math>~~ ~~\begin{align}~~ ~~\operatorname{prox}_{\iota_C}(x)~~ ~~&= \operatorname{argmin}\limits_y~~ ~~\begin{cases}~~ ~~\frac{1}{2} \left\\| x-y \right\\|_2^2 & \text{if } y \in C \\~~ ~~+ \infty & \text{if } y \notin C~~ ~~\end{cases} \\~~ ~~&=\operatorname{argmin}\limits_{y \in C} \frac{1}{2} \left\\| x-y \right\\|_2^2 \\~~ ~~&= P_C(x)~~ ~~\end{align}~~ ~~</math>~~ ~~showing that the proximity operator is indeed a generalisation of the projection operator.~~ ~~The proximity operator of <math>f</math> is characterized by inclusion~~ ~~: <math>~~ ~~p=\operatorname{prox}_f(x) \Leftrightarrow x-p \in \partial f(p) \qquad (\forall(x,p) \in \mathbb{R}^N \times \mathbb{R}^N)~~ ~~</math>~~ ~~If <math>f</math> is differentiable then above equation reduces to~~ ~~: <math>~~ ~~p=\operatorname{prox}_f(x) \Leftrightarrow x-p = \nabla f(p) \quad (\forall(x,p) \in \mathbb{R}^N \times \mathbb{R}^N)~~ ~~</math>~~ == Examples == Line 105 ⟶ 33: == See also == * [[Alternating projection]] * [[Convex optimization]] * [[Frank–Wolfe algorithm]]▼ * [[Proximal operator]] * [[Proximal gradient methods for learning]] ▲* [[Frank–Wolfe algorithm]] == Notes ==