Revision as of 20:35, 15 May 2020 edit Magicheader (talk \| contribs) 139 edits No edit summary ← Previous edit		Revision as of 20:36, 15 May 2020 edit undo Magicheader (talk \| contribs) 139 edits No edit summary Next edit →
Line 1: '''Stochastic gradient descent''' (often abbreviated '''SGD''') is an [[iterative method]] for [[Mathematical optimization\|optimizing]] an [[objective function]] with suitable [[smoothness]] properties (e.g. [[Differentiable function\|differentiable]] or [[Subgradient method\|subdifferentiable]]). It can be regarded as a [[stochastic approximation]] of [[gradient descent]] optimization, since it replaces the actual gradient (calculated from the entire [[data set]]) by an estimate thereof (calculated from a randomly selected subset of the data). Especially in [[big data]] applications this reduces the [[Computational complexity\|computational burden]], achieving faster iterations in trade for a slightly lower convergence rate.<ref>{{cite book \|first=Léon \|last=Bottou \|authorlink=Léon Bottou \|first2=Olivier \|last2=Bousquet \|chapter=The Tradeoffs of Large Scale Learning \|title=Optimization for Machine Learning \|editor-first=Suvrit \|editor-last=Sra \|editor2-first=Sebastian \|editor2-last=Nowozin \|editor3-first=Stephen J. \|editor3-last=Wright \|___location=Cambridge \|publisher=MIT Press \|year=2012 \|isbn=978-0-262-01646-9 \|pages=351–368 \|chapterurl=https://books.google.com/books?id=JPQx7s2L1A8C&pg=PA351 }}</ref> While the basic idea behind stochastic approximation can be traced back to the [[Robbins–Monro algorithm]] of the 1950s, stochastic gradient descent has become an important optimization method in [[machine learning]].<ref>{{Cite book \|last=Bottou \|first=Léon \|authorlink=Léon Bottou \|contribution=Online Algorithms and Stochastic Approximations \|year=1998 \|title=Online Learning and Neural Networks \|publisher=Cambridge University Press \|url=https://archive.org/details/onlinelearningin0000unse \|isbn=978-0-521-65263-6 \|url-access=registration }}</ref><ref>{{cite news \|last=Kiwiel \|first=Krzysztof C. \|title=Convergence and efficiency of subgradient methods for quasiconvex minimization \|journal=Mathematical Programming, Series A \|publisher=Springer\|___location=Berlin, Heidelberg \|issn=0025-5610\|pages=1–25\|volume=90\|issue=1 \|doi=10.1007/PL00011414\|year=2001 \|mr=1819784}}</ref> ==Background==

Stochastic gradient descent: Difference between revisions