Content deleted Content added
Kiwiel's subgradient-projection with level control and proximal bundle methods (like Lemaréchal & Claudia S.'s) codes are competitive with interior point methods. Kiwiel has "optimal" information complexity results also. |
updated outdated term Tags: Mobile edit Mobile app edit iOS app edit App section source |
||
(37 intermediate revisions by 28 users not shown) | |||
Line 1:
{{Short description|Concept in convex optimization mathematics}}
'''Subgradient methods''' are [[
Subgradient methods are slower than Newton's method when applied to minimize twice continuously differentiable convex functions. However, Newton's method fails to converge on problems that have non-differentiable kinks.
In recent years, some [[interior-point methods]] have been suggested for convex minimization problems, but subgradient projection methods and related bundle methods of descent remain competitive. For convex minimization problems with very large number of dimensions, subgradient-projection methods are suitable, because they require little storage.
Subgradient projection methods are often applied to large-scale problems with decomposition techniques. Such decomposition methods often allow a simple distributed method for a problem.
Line 9 ⟶ 10:
==Classical subgradient rules==
Let <math>f : \
A where <math>g^{(k)}</math> denotes
If <math>f \ </math> is differentiable, then its only subgradient is the gradient vector <math>\nabla f</math> itself. It may happen that <math>-g^{(k)}</math> is not a descent direction for <math>f \ </math> at <math>x^{(k)}.</math>
===Step size rules===
Line 20 ⟶ 23:
*Constant step size, <math>\alpha_k = \alpha.</math>
*Constant step length, <math>\alpha_k = \gamma/\lVert g^{(k)} \rVert_2,</math>
*Square summable but not summable step size, i.e. any step sizes satisfying <math display="block">\alpha_k\geq0,\qquad\sum_{k=1}^\infty \alpha_k^2 < \infty,\qquad \sum_{k=1}^\infty \alpha_k = \infty.</math>
*Nonsummable diminishing step lengths, i.e. <math>\alpha_k = \gamma_k/\lVert g^{(k)} \rVert_2,</math>
For all five rules, the step-sizes are determined "off-line", before the method is iterated; the step-sizes do not depend on preceding iterations. This "off-line" property of subgradient methods differs from the "on-line" step-size rules used for descent methods for differentiable functions: Many methods for minimizing differentiable functions satisfy Wolfe's sufficient conditions for convergence, where step-sizes typically depend on the current point and the current search-direction. An extensive discussion of stepsize rules for subgradient methods, including incremental versions, is given in the books by Bertsekas<ref>{{cite book▼
▲*Nonsummable diminishing step lengths, i.e. <math>\alpha_k = \gamma_k/\lVert g^{(k)} \rVert_2</math>, where
| first = Dimitri P.
| author-link = Dimitri P. Bertsekas
▲For all five rules, the step-sizes are determined "off-line", before the method is iterated; the step-sizes do not depend on preceding iterations. This "off-line" property of subgradient methods differs from the "on-line" step-size rules used for descent methods for differentiable functions: Many methods for minimizing differentiable functions satisfy Wolfe's sufficient conditions for convergence, where step-sizes typically depend on the current point and the current search-direction.
| title = Convex Optimization Algorithms
| edition = Second
| publisher = Athena Scientific
| year = 2015
| ___location = Belmont, MA.
| isbn = 978-1-886529-28-1
}}</ref> and by Bertsekas, Nedic, and Ozdaglar.<ref>{{cite book
|last1=Bertsekas
|first1=Dimitri P.
|last2=Nedic
|first2=Angelia
|last3 = Ozdaglar
|first3 = Asuman
| title = Convex Analysis and Optimization
| edition = Second
| publisher = Athena Scientific
| year = 2003
| ___location = Belmont, MA.
| isbn = 1-886529-45-0 }}</ref>
===Convergence results===
Line 36 ⟶ 58:
| last = Bertsekas
| first = Dimitri P.
|
| title = Nonlinear Programming
| edition = Second
Line 46 ⟶ 68:
| last = Shor
| first = Naum Z.
|
| title = Minimization Methods for Non-differentiable Functions
| publisher = [[Springer-Verlag]]
Line 53 ⟶ 75:
}}
</ref>
These classical subgradient methods have poor performance and are no longer recommended for general use.<ref name="Lem"/><ref name="KLL"/> However, they are still used widely in specialized applications because they are simple and they can be easily adapted to take advantage of the special structure of the problem at hand.
==Subgradient-projection
During the 1970s, [[Claude Lemaréchal]] and Phil
{{cite book
| last = Bertsekas
| first = Dimitri P.
|
| title = Nonlinear Programming
| edition = Second
Line 67 ⟶ 89:
| ___location = Cambridge, MA.
| isbn = 1-886529-00-0
}}
</ref> The meaning of the term "bundle methods" has changed significantly since that time. Modern versions and full convergence analysis were provided by Kiwiel.
<ref>
{{cite book|last=Kiwiel|first=Krzysztof|title=Methods of Descent for Nondifferentiable Optimization|publisher=[[Springer Verlag]]|___location=Berlin|year=1985|pages=362|isbn=978-3540156420 |mr=0797754}}
</ref> Contemporary bundle-methods often use "[[level set|level]] control" rules for choosing step-sizes, developing techniques from the "subgradient-projection" method of Boris T. Polyak (1969). However, there are problems on which bundle methods offer little advantage over subgradient-projection methods.<ref name="Lem">
{{cite book| last=Lemaréchal|first=Claude|
{{cite journal|last1=Kiwiel|first1=Krzysztof C.|last2=Larsson |first2=Torbjörn|last3=Lindberg|first3=P. O.|title=Lagrangian relaxation via ballstep subgradient methods|url=http://
</ref>
==Constrained optimization==
===Projected subgradient===
One extension of the subgradient method is the '''projected subgradient method''', which solves the constrained [[Mathematical optimization|optimization]] problem
:minimize <math>f(x) \ </math> subject to <math display=block>x \in \mathcal{C}</math>
:<math>x\in\mathcal{C}</math>▼
where <math>\mathcal{C}</math> is a convex set. The projected subgradient method uses the iteration▼
:<math>x^{(k+1)} = P \left(x^{(k)} - \alpha_k g^{(k)} \right) </math>▼
▲
where <math>P</math> is projection on <math>\mathcal{C}</math> and <math>g^{(k)}</math> is any subgradient of <math>f \ </math> at <math>x^{(k)}.</math>
Line 89 ⟶ 112:
The subgradient method can be extended to solve the inequality constrained problem
:minimize <math>f_0(x) \ </math> subject to <math display=block>f_i (x) \leq 0,\quad i = 1,\ldots,m</math>
where <math>f_i</math> are convex. The algorithm takes the same form as the unconstrained case
▲:<math>x^{(k+1)} = x^{(k)} - \alpha_k g^{(k)} \ </math>
where <math>\alpha_k>0</math> is a step size, and <math>g^{(k)}</math> is a subgradient of the objective or one of the constraint functions at <math>x. \ </math> Take
▲:<math>g^{(k)} =
\begin{cases}
\partial f_0 (x) & \text{ if } f_i(x) \leq 0 \; \forall i = 1 \dots m \\
\partial f_j (x) & \text{ for some } j \text{ such that } f_j(x) > 0
\end{cases}</math>
where <math>\partial f</math> denotes the [[subdifferential]] of <math>f. \ </math>
==See also==
▲where <math>\partial f</math> denotes the subdifferential of <math>f \ </math>. If the current point is feasible, the algorithm uses an objective subgradient; if the current point is infeasible, the algorithm chooses a subgradient of any violated constraint.
* {{annotated link|Stochastic gradient descent}}
==References==
{{reflist}}
==Further reading==
* {{cite book
| last = Bertsekas
| first = Dimitri P.
▲ | authorlink = Dimitri P. Bertsekas
| title = Nonlinear Programming
| publisher = Athena Scientific
| year = 1999
| ___location =
| isbn = 1-886529-00-0
}}
* {{cite book
|last1=Bertsekas
|first1=Dimitri P.
|last2=Nedic
|first2=Angelia
|last3 = Ozdaglar
|first3 = Asuman
| title = Convex Analysis and Optimization
| edition = Second
| publisher = Athena Scientific
| year = 2003
| ___location = Belmont, MA.
| isbn = 1-886529-45-0
}}
* {{cite book
| last = Bertsekas
| first = Dimitri P.
| title = Convex Optimization Algorithms
| publisher = Athena Scientific
| year = 2015
| ___location = Belmont, MA.
| isbn = 978-1-886529-28-1
}}
* {{cite book
| last = Shor
| first = Naum Z.
| title = Minimization Methods for Non-differentiable Functions
| publisher = [[Springer-Verlag]]
Line 129 ⟶ 174:
| year = 1985
}}
* {{cite book|last=Ruszczyński|author-link=Andrzej Piotr Ruszczyński|first=Andrzej|title=Nonlinear Optimization|publisher=[[Princeton University Press]]|___location=Princeton, NJ|year=2006|pages=xii+454|isbn=978-0691119151 |mr=2199043}}
==External links==
* [http://www.stanford.edu/class/ee364a/ EE364A] and [http://www.stanford.edu/class/ee364b/ EE364B], Stanford's convex optimization course sequence.
{{Convex analysis and variational analysis}}
{{optimization algorithms|convex}}
[[Category:
[[Category:Convex optimization]]
[[Category:Mathematical optimization]]
[[Category:Optimization algorithms and methods]]
|