Revision as of 15:20, 10 May 2023 edit Glenohumeral13 (talk \| contribs) 334 edits Top: Removed fluff sentences. Tags: Mobile edit Mobile web edit Advanced mobile edit ← Previous edit		Revision as of 06:32, 13 May 2023 edit undo Saung Tadashi (talk \| contribs) Extended confirmed users 2,083 edits →Fast gradient methods Tag: Visual edit Next edit →
Line 231: ===Fast gradient methods=== [[Yurii Nesterov]] has proposed<ref>{{cite book \|first=Yurii \|last=Nesterov \|author-link=Yurii Nesterov \|title=Introductory Lectures on Convex Optimization : A Basic Course \|publisher=Springer \|year=2004 \|isbn=1-4020-7553-7 }}</ref> a simple modification that enables faster convergence for convex problems and has been since further generalized. For unconstrained smooth problems the method is called the [[fast gradient method]] (FGM) or the [[accelerated gradient method]] (AGM). Specifically, if the differentiable function <math>F</math> is convex and <math>\nabla F</math> is [[Lipschitz continuity\|Lipschitz]], and it is not assumed that <math>F</math> is [[Convex function#Strongly convex functions\|strongly convex]], then the error in the objective value generated at each step <math>k</math> by the gradient descent method will be [[Big O notation\|bounded by]] <math display="inline">\mathcal{O}\left(\tfrac{1}{k}\right)</math>. Using the Nesterov acceleration technique, the error decreases at <math display="inline">\mathcal{O}\left(\tfrac{1}{k^{2}}\right)</math>.<ref>{{cite web \|url=https://www.seas.ucla.edu/~vandenbe/236C/lectures/fgrad.pdf \|title=Fast Gradient Methods \|work=Lecture notes for EE236C at UCLA \|first=Lieven \|last=Vandenberghe \|date=2019 }}</ref><ref>{{Cite journal \|last=Walkington \|first=Noel J. \|date=2023 \|title=Nesterov's Method for Convex Optimization \|url=https://epubs.siam.org/doi/10.1137/21M1390037 \|journal=SIAM Review \|language=en \|volume=65 \|issue=2 \|pages=539–562 \|doi=10.1137/21M1390037 \|issn=0036-1445}}</ref> It is known that the rate <math>\mathcal{O}\left({k^{-2}}\right)</math> for the decrease of the [[loss function\|cost function]] is optimal for first-order optimization methods. Nevertheless, there is the opportunity to improve the algorithm by reducing the constant factor. The [[optimized gradient method]] (OGM)<ref>{{cite journal \|first1=D. \|last1=Kim \|first2=J. A. \|last2=Fessler \|title=Optimized First-order Methods for Smooth Convex Minimization \|journal=[[Mathematical Programming\|Math. Prog.]] \|volume=151 \|issue=1–2 \|pages=81–107 \|year=2016 \|doi=10.1007/s10107-015-0949-3 \|pmid=27765996 \|pmc=5067109 \|arxiv=1406.5468 \|s2cid=207055414 }}</ref> reduces that constant by a factor of two and is an optimal first-order method for large-scale problems.<ref>{{cite journal \|first=Yoel \|last=Drori \|date=2017 \|title=The Exact Information-based Complexity of Smooth Convex Minimization \|journal=Journal of Complexity \|volume=39 \|pages=1–16 \|doi=10.1016/j.jco.2016.11.001 \|arxiv=1606.01424 \|s2cid=205861966 }}</ref> For constrained or non-smooth problems, Nesterov's FGM is called the [[fast proximal gradient method]] (FPGM), an acceleration of the [[proximal gradient method]].

Gradient descent: Difference between revisions