Subgradient method: Difference between revisions

Content deleted Content added
Clarify that subgradient projection methods (and bundle methods of descent) are still competitive (although idiotic subgradient methods are not)
Basic subgradient update: Rewrite, correcting "algorithm" when "iterative method" is needed, etc.
Line 7:
Subgradient projection methods are often applied to large-scale problems with decomposition techniques. Such decomposition methods often allow a simple distributed method for a problem.
 
==BasicClassical subgradient updaterules==
 
Let <math>f:\mathbb{R}^n \to \mathbb{R}</math> be a [[convex function]] with ___domain <math>\mathbb{R}^n</math>. TheA subgradient methodclassical usessubgradient themethod iterationiterates
:<math>x^{(k+1)} = x^{(k)} - \alpha_k g^{(k)} \ </math>
where <math>g^{(k)}</math> denotes a [[subgradient]] of <math> f \ </math> at <math>x^{(k)} \ </math>. If <math>f \ </math> is differentiable, then its only subgradient is the gradient vector <math>\nabla f</math> itself.
It may happen that <math>-g^{(k)}</math> is not a descent direction for <math>f \ </math> at <math>x^{(k)}</math>. We therefore maintain a list <math>f_{\rm{best}} \ </math> that keeps track of the lowest objective function value found so far, i.e.
:<math>f_{\rm{best}}^{(k)} = \min\{f_{\rm{best}}^{(k-1)} , f(x^{(k)}) \}.</math>
Line 17:
===Step size rules===
 
Many different types of step -size rules are used in theby subgradient methodmethods. FiveThis basicarticle stepnotes five classical step-size rules for which convergence is[[mathematical guaranteedproof|proof]]s are known:
 
*Constant step size, <math>\alpha_k = \alpha.</math>
Line 27:
*Nonsummable diminishing step lengths, i.e. <math>\alpha_k = \gamma_k/\lVert g^{(k)} \rVert_2</math>, where
:<math>\gamma_k\geq0,\qquad \lim_{k\to\infty} \gamma_k = 0,\qquad \sum_{k=1}^\infty \gamma_k = \infty.</math>
NoticeFor thatall five rules, the step -sizes listed above are determined "off-line", before the algorithmmethod is runiterated; andthe step-sizes do not depend on anypreceding dataiterations. computed duringThis the"off-line" algorithm.property of Thissubgradient is verymethods differentdiffers from the step"on-line" step-size rules found inused standardfor descent methods for differentiable functions: Many methods for minimizing differentiable functions satisfy Wolfe's sufficient conditions for convergence, whichwhere step-sizes typically depend on the current point and searchthe current search-direction.
 
===Convergence results===