Subgradient method: Difference between revisions

Content deleted Content added
wikilink
updated outdated term
Tags: Mobile edit Mobile app edit iOS app edit App section source
 
(7 intermediate revisions by 4 users not shown)
Line 1:
{{Short description|Concept in convex optimization mathematics}}
'''Subgradient methods''' are [[iterativeconvex methodoptimization]]s formethods solvingwhich use [[convex optimizationSubderivative|convex minimizationsubderivatives]] problems. Originally developed by [[Naum Z. Shor]] and others in the 1960s and 1970s, subgradient methods are convergent when applied even to a non-differentiable objective function. When the objective function is differentiable, sub-gradient methods for unconstrained problems use the same search direction as the method of [[gradient descent|steepestgradient descent]].
 
Subgradient methods are slower than Newton's method when applied to minimize twice continuously differentiable convex functions. However, Newton's method fails to converge on problems that have non-differentiable kinks.
Line 9 ⟶ 10:
==Classical subgradient rules==
 
Let <math>f : \mathbb{R}Reals^n \to \mathbb{R}Reals</math> be a [[convex function]] with ___domain <math>\mathbb{R}Reals^n.</math>.
A classical subgradient method iterates
:<math display=block>x^{(k+1)} = x^{(k)} - \alpha_k g^{(k)} \ </math>
where <math>g^{(k)}</math> denotes ''any'' [[subgradient]] of <math> f \ </math> at <math>x^{(k)}, \ </math>, and <math>x^{(k)}</math> is the <math>k^{th}</math> iterate of <math>x.</math>.
If <math>f \ </math> is differentiable, then its only subgradient is the gradient vector <math>\nabla f</math> itself.
It may happen that <math>-g^{(k)}</math> is not a descent direction for <math>f \ </math> at <math>x^{(k)}.</math>. We therefore maintain a list <math>f_{\rm{best}} \ </math> that keeps track of the lowest objective function value found so far, i.e.
:<math display=block>f_{\rm{best}}^{(k)} = \min\{f_{\rm{best}}^{(k-1)} , f(x^{(k)}) \}.</math>
 
===Step size rules===
Line 20 ⟶ 23:
 
*Constant step size, <math>\alpha_k = \alpha.</math>
*Constant step length, <math>\alpha_k = \gamma/\lVert g^{(k)} \rVert_2,</math>, which gives <math>\lVert x^{(k+1)} - x^{(k)} \rVert_2 = \gamma.</math>
*Square summable but not summable step size, i.e. any step sizes satisfying <math display="block">\alpha_k\geq0,\qquad\sum_{k=1}^\infty \alpha_k^2 < \infty,\qquad \sum_{k=1}^\infty \alpha_k = \infty.</math>
*Nonsummable diminishing, i.e. any step sizes satisfying <math display="block">\alpha_k \geq 0,\qquad \lim_{k\to\infty} \alpha_k = 0,\qquad \sum_{k=1}^\infty \alpha_k = \infty.</math>
*Nonsummable diminishing step lengths, i.e. <math>\alpha_k = \gamma_k/\lVert g^{(k)} \rVert_2,</math>, where <math display="block">\gamma_k \geq 0,\qquad \lim_{k\to\infty} \gamma_k = 0,\qquad \sum_{k=1}^\infty \gamma_k = \infty.</math>
For all five rules, the step-sizes are determined "off-line", before the method is iterated; the step-sizes do not depend on preceding iterations. This "off-line" property of subgradient methods differs from the "on-line" step-size rules used for descent methods for differentiable functions: Many methods for minimizing differentiable functions satisfy Wolfe's sufficient conditions for convergence, where step-sizes typically depend on the current point and the current search-direction. An extensive discussion of stepsize rules for subgradient methods, including incremental versions, is given in the books by Bertsekas<ref>{{cite book
| last = Bertsekas
Line 92 ⟶ 95:
</ref> Contemporary bundle-methods often use "[[level set|level]] control" rules for choosing step-sizes, developing techniques from the "subgradient-projection" method of Boris T. Polyak (1969). However, there are problems on which bundle methods offer little advantage over subgradient-projection methods.<ref name="Lem">
{{cite book| last=Lemaréchal|first=Claude|author-link=Claude Lemaréchal|chapter=Lagrangian relaxation|pages=112–156|title=Computational combinatorial optimization: Papers from the Spring School held in Schloß Dagstuhl, May 15–19, 2000|editor=Michael Jünger and Denis Naddef|series=Lecture Notes in Computer Science|volume=2241|publisher=Springer-Verlag| ___location=Berlin|year=2001|isbn=3-540-42877-1|mr=1900016|doi=10.1007/3-540-45586-8_4|s2cid=9048698 }}</ref><ref name="KLL">
{{cite journal|last1=Kiwiel|first1=Krzysztof&nbsp;C.|last2=Larsson |first2=Torbjörn|last3=Lindberg|first3=P.&nbsp;O.|title=Lagrangian relaxation via ballstep subgradient methods|url=http://morrcin.journalorg.informs.orgpl/cgiContent/content139438/abstractPDF/32/3/669RB-2002-76.pdf |journal=[[Mathematics of Operations Research]]|volume=32|date=August 2007|number=3|pages=669–686|mr=2348241|doi=10.1287/moor.1070.0261}}
</ref>
 
==Constrained optimization==
===Projected subgradient===
One extension of the subgradient method is the '''projected subgradient method''', which solves the constrained [[Mathematical optimization|optimization]] problem
:minimize <math>f(x) \ </math> subject to <math display=block>x \in \mathcal{C}</math>
:<math>x\in\mathcal{C}</math>
 
where <math>\mathcal{C}</math> is a [[convex set]]. The projected subgradient method uses the iteration
 
:<math>x^{(k+1)} = P \left(x^{(k)} - \alpha_k g^{(k)} \right) </math>
 
:where <math>x\in\mathcal{C}</math> is a [[convex set]].
where <math>\mathcal{C}</math> is a [[convex set]]. The projected subgradient method uses the iteration
:<math display=block>x^{(k+1)} = P \left(x^{(k)} - \alpha_k g^{(k)} \right) </math>
where <math>P</math> is projection on <math>\mathcal{C}</math> and <math>g^{(k)}</math> is any subgradient of <math>f \ </math> at <math>x^{(k)}.</math>
 
Line 111 ⟶ 112:
The subgradient method can be extended to solve the inequality constrained problem
 
:minimize <math>f_0(x) \ </math> subject to <math display=block>f_i (x) \leq 0,\quad i = 1,\ldots,m</math>
:<math>f_i (x) \leq 0,\quad i = 1,\dots,m</math>
 
where <math>f_i</math> are convex. The algorithm takes the same form as the unconstrained case
:<math display=block>x^{(k+1)} = x^{(k)} - \alpha_k g^{(k)} \ </math>
 
:<math>x^{(k+1)} = x^{(k)} - \alpha_k g^{(k)} \ </math>
 
where <math>\alpha_k>0</math> is a step size, and <math>g^{(k)}</math> is a subgradient of the objective or one of the constraint functions at <math>x. \ </math> Take
:<math display=block>g^{(k)} =
 
:<math>g^{(k)} =
\begin{cases}
\partial f_0 (x) & \text{ if } f_i(x) \leq 0 \; \forall i = 1 \dots m \\
\partial f_j (x) & \text{ for some } j \text{ such that } f_j(x) > 0
\end{cases}</math>
where <math>\partial f</math> denotes the [[subdifferential]] of <math>f. \ </math>. If the current point is feasible, the algorithm uses an objective subgradient; if the current point is infeasible, the algorithm chooses a subgradient of any violated constraint.
 
==See also==
where <math>\partial f</math> denotes the [[subdifferential]] of <math>f \ </math>. If the current point is feasible, the algorithm uses an objective subgradient; if the current point is infeasible, the algorithm chooses a subgradient of any violated constraint.
 
* {{annotated link|Stochastic gradient descent}}
 
==References==
 
{{Reflist}}
{{reflist}}
 
==Further reading==
 
* {{cite book
| last = Bertsekas
Line 175 ⟶ 177:
 
==External links==
 
* [http://www.stanford.edu/class/ee364a/ EE364A] and [http://www.stanford.edu/class/ee364b/ EE364B], Stanford's convex optimization course sequence.
 
{{Convex analysis and variational analysis}}
{{optimization algorithms|convex}}
 
[[Category:OptimizationConvex algorithms and methodsanalysis]]
[[Category:Convex optimization]]
[[Category:Mathematical optimization]]
[[Category:Optimization algorithms and methods]]