Revision as of 21:34, 13 October 2010 edit Kiefer.Wolfowitz (talk \| contribs) 39,688 edits Clarify that subgradient projection methods (and bundle methods of descent) are still competitive (although idiotic subgradient methods are not) ← Previous edit		Revision as of 21:42, 13 October 2010 edit undo Kiefer.Wolfowitz (talk \| contribs) 39,688 edits →Basic subgradient update: Rewrite, correcting "algorithm" when "iterative method" is needed, etc. Next edit →
Line 7: Subgradient projection methods are often applied to large-scale problems with decomposition techniques. Such decomposition methods often allow a simple distributed method for a problem. ==~~Basic~~Classical subgradient ~~update~~rules== Let <math>f:\mathbb{R}^n \to \mathbb{R}</math> be a [[convex function]] with ___domain <math>\mathbb{R}^n</math>. ~~The~~A ~~subgradient~~ ~~method~~classical ~~uses~~subgradient ~~the~~method ~~iteration~~iterates :<math>x^{(k+1)} = x^{(k)} - \alpha_k g^{(k)} \ </math> where <math>g^{(k)}</math> denotes a [[subgradient]] of <math> f \ </math> at <math>x^{(k)} \ </math>. If <math>f \ </math> is differentiable, then its only subgradient is the gradient vector <math>\nabla f</math> itself. It may happen that <math>-g^{(k)}</math> is not a descent direction for <math>f \ </math> at <math>x^{(k)}</math>. We therefore maintain a list <math>f_{\rm{best}} \ </math> that keeps track of the lowest objective function value found so far, i.e. :<math>f_{\rm{best}}^{(k)} = \min\{f_{\rm{best}}^{(k-1)} , f(x^{(k)}) \}.</math> Line 17: ===Step size rules=== Many different types of step -size rules are used ~~in the~~by subgradient ~~method~~methods. ~~Five~~This ~~basic~~article ~~step~~notes five classical step-size rules for which convergence is[[mathematical ~~guaranteed~~proof\|proof]]s are known: Constant step size, <math>\alpha_k = \alpha.</math> Line 27: Nonsummable diminishing step lengths, i.e. <math>\alpha_k = \gamma_k/\lVert g^{(k)} \rVert_2</math>, where :<math>\gamma_k\geq0,\qquad \lim_{k\to\infty} \gamma_k = 0,\qquad \sum_{k=1}^\infty \gamma_k = \infty.</math> ~~Notice~~For ~~that~~all five rules, the step -sizes ~~listed above~~ are determined "off-line", before the ~~algorithm~~method is ~~run~~iterated; ~~and~~the step-sizes do not depend on ~~any~~preceding ~~data~~iterations. ~~computed~~ ~~during~~This ~~the~~"off-line" ~~algorithm.~~property of ~~This~~subgradient ~~is very~~methods ~~different~~differs from the ~~step~~"on-line" step-size rules ~~found in~~used ~~standard~~for descent methods for differentiable functions: Many methods for minimizing differentiable functions satisfy Wolfe's sufficient conditions for convergence, ~~which~~where step-sizes typically depend on the current point and ~~search~~the current search-direction. ===Convergence results===

Subgradient method: Difference between revisions