Content deleted Content added
Mention stochastic gradient descent early in the article. |
m fix link from previous edit. Tag: Disambiguation links added |
||
Line 8:
Gradient descent is generally attributed to [[Augustin-Louis Cauchy]], who first suggested it in 1847.<ref>{{cite journal |first=C. |last=Lemaréchal |author-link=Claude Lemaréchal |title=Cauchy and the Gradient Method |journal=Doc Math Extra |pages=251–254 |year=2012 |url=https://www.math.uni-bielefeld.de/documenta/vol-ismp/40_lemarechal-claude.pdf }}</ref> [[Jacques Hadamard]] independently proposed a similar method in 1907.<ref>{{Cite journal|last=Hadamard|first=Jacques|date=1908|title=Mémoire sur le problème d'analyse relatif à l'équilibre des plaques élastiques encastrées|journal=Mémoires présentés par divers savants éstrangers à l'Académie des Sciences de l'Institut de France|volume=33}}</ref><ref>{{cite journal |last1=Courant |first1=R. |title=Variational methods for the solution of problems of equilibrium and vibrations |journal=Bulletin of the American Mathematical Society |date=1943 |volume=49 |issue=1 |pages=1–23 |doi=10.1090/S0002-9904-1943-07818-4 |doi-access=free }}</ref> Its convergence properties for non-linear optimization problems were first studied by [[Haskell Curry]] in 1944,<ref>{{cite journal |first=Haskell B. |last=Curry |title=The Method of Steepest Descent for Non-linear Minimization Problems |journal=Quart. Appl. Math. |volume=2 |year=1944 |issue=3 |pages=258–261 |doi=10.1090/qam/10667 |doi-access=free }}</ref> with the method becoming increasingly well-studied and used in the following decades.<ref name="BP" /><ref name="AK82" />
A simple extension of gradient descent, [[stochastic gradient descent]], serves as the most basic algorithm used for training most [[deep
==Description==
|