Content deleted Content added
Magicheader (talk | contribs) No edit summary |
Magicheader (talk | contribs) No edit summary |
||
Line 1:
'''Stochastic gradient descent''' (often abbreviated '''SGD''') is an [[iterative method]] for [[Mathematical optimization|optimizing]] an [[objective function]] with suitable [[smoothness]] properties (e.g. [[Differentiable function|differentiable]] or [[Subgradient method|subdifferentiable]]). It can be regarded as a [[stochastic approximation]] of [[gradient descent]] optimization, since it replaces the actual gradient (calculated from the entire [[data set]]) by an estimate thereof (calculated from a randomly selected subset of the data).<ref>{{cite book |first=Léon |last=Bottou |authorlink=Léon Bottou |first2=Olivier |last2=Bousquet |chapter=The Tradeoffs of Large Scale Learning |title=Optimization for Machine Learning |editor-first=Suvrit |editor-last=Sra |editor2-first=Sebastian |editor2-last=Nowozin |editor3-first=Stephen J. |editor3-last=Wright |___location=Cambridge |publisher=MIT Press |year=2012 |isbn=978-0-262-01646-9 |pages=351–368 |chapterurl=https://books.google.com/books?id=JPQx7s2L1A8C&pg=PA351 }}</ref>
While the basic idea behind stochastic approximation can be traced back to the [[Robbins–Monro algorithm]] of the 1950s
==Background==
|