Revision as of 13:26, 29 July 2018 edit Wefatherley (talk \| contribs) 49 edits m Connecting SGD to Stochastic approximation via PNAS article ← Previous edit		Revision as of 17:51, 3 August 2018 edit undo Yoderj (talk \| contribs) 467 edits Explain why it is called stochastic in the intro. Next edit →
Line 1: '''Stochastic gradient descent''' (often shortened to '''SGD'''), also known as '''incremental''' gradient descent, is an [[iterative method]] for [[Mathematical optimization\|optimizing]] a [[Differentiable function\|differentiable]] [[objective function]], a [[stochastic approximation]] of [[gradient descent]] optimization. A recent article<ref>{{cite journal \| last = Mei \| first = Song \| title = A mean field view of the landscape of two-layer neural networks \| journal = Proceedings of the National Academy of Sciences \| volume = \| issue = \| year = 2018 \| pages = \| jstor = \| doi = 10.1073/pnas.1806579115 }}</ref> implicitly credits [[Herbert Robbins]] and Sutton Monro for developing SGD in their 1951 article titled "A Stochastic Approximation Method"; see [[Stochastic approximation]] for more information. It is called '''stochastic''' because samples are selected randomly (or shuffled) instead of as a single group (as in standard [[gradient descent]]) or in the order the appear in the training set. == Background ==

Stochastic gradient descent: Difference between revisions