Revision as of 12:59, 26 April 2020 edit MrOllie (talk \| contribs) Extended confirmed users, Pending changes reviewers, Rollbackers 255,671 edits refspam ← Previous edit		Revision as of 20:35, 15 May 2020 edit undo Magicheader (talk \| contribs) 139 edits No edit summary Next edit →
Line 1: '''Stochastic gradient descent''' (often abbreviated '''SGD''') is an [[iterative method]] for [[Mathematical optimization\|optimizing]] an [[objective function]] with suitable [[smoothness]] properties (e.g. [[Differentiable function\|differentiable]] or [[Subgradient method\|subdifferentiable]]). It can be regarded as a [[stochastic approximation]] of [[gradient descent]] optimization, since it replaces the actual gradient (calculated from the entire [[data set]]) by an estimate thereof (calculated from a randomly selected subset of the data).~~<ref name="Taddy2019" />~~ Especially in [[big data]] applications this reduces the [[Computational complexity\|computational burden]], achieving faster iterations in trade for a slightly lower convergence rate.<ref>{{cite book \|first=Léon \|last=Bottou \|authorlink=Léon Bottou \|first2=Olivier \|last2=Bousquet \|chapter=The Tradeoffs of Large Scale Learning \|title=Optimization for Machine Learning \|editor-first=Suvrit \|editor-last=Sra \|editor2-first=Sebastian \|editor2-last=Nowozin \|editor3-first=Stephen J. \|editor3-last=Wright \|___location=Cambridge \|publisher=MIT Press \|year=2012 \|isbn=978-0-262-01646-9 \|pages=351–368 \|chapterurl=https://books.google.com/books?id=JPQx7s2L1A8C&pg=PA351 }}</ref> While the basic idea behind stochastic approximation can be traced back to the [[Robbins–Monro algorithm]] of the 1950s,<ref>{{cite journal \| last = Mei \| first = Song \| last2 = Montanari \| first2 = Andrea \| last3 = Nguyen \| first3 = Phan-Minh \| title = A mean field view of the landscape of two-layer neural networks \| journal = Proceedings of the National Academy of Sciences \| volume = 115\| issue = 33\| year = 2018 \| pages = E7665–E7671\| jstor = \| doi = 10.1073/pnas.1806579115 \| pmid = 30054315 \| pmc = 6099898 \| arxiv = 1804.06561 \| bibcode = 2018arXiv180406561M }}</ref> stochastic gradient descent has become an important optimization method in [[machine learning]].<ref name="Taddy2019">{{cite book \|first=Matt \|last=Taddy \|title=Business Data Science: Combining Machine Learning and Economics to Optimize, Automate, and Accelerate Business Decisions \|chapter=Stochastic Gradient Descent \|pages=303–307 \|___location=New York \|publisher=McGraw-Hill \|year=2019 \|isbn=978-1-260-45277-8 \|chapterurl=https://books.google.com/books?id=yPOUDwAAQBAJ&pg=PA303 }}</ref> ==Background==

Stochastic gradient descent: Difference between revisions