Stochastic gradient descent: Difference between revisions

Content deleted Content added
m Connecting SGD to Stochastic approximation via PNAS article
Yoderj (talk | contribs)
Explain why it is called stochastic in the intro.
Line 1:
'''Stochastic gradient descent''' (often shortened to '''SGD'''), also known as '''incremental''' gradient descent, is an [[iterative method]] for [[Mathematical optimization|optimizing]] a [[Differentiable function|differentiable]] [[objective function]], a [[stochastic approximation]] of [[gradient descent]] optimization. A recent article<ref>{{cite journal | last = Mei | first = Song | title = A mean field view of the landscape of two-layer neural networks | journal = Proceedings of the National Academy of Sciences | volume = | issue = | year = 2018 | pages = | jstor = | doi = 10.1073/pnas.1806579115 }}</ref> implicitly credits [[Herbert Robbins]] and Sutton Monro for developing SGD in their 1951 article titled "A Stochastic Approximation Method"; see [[Stochastic approximation]] for more information. It is called '''stochastic''' because samples are selected randomly (or shuffled) instead of as a single group (as in standard [[gradient descent]]) or in the order the appear in the training set.
 
== Background ==