Revision as of 22:30, 25 November 2019 edit Magicheader (talk \| contribs) 139 edits No edit summary ← Previous edit		Revision as of 22:40, 25 November 2019 edit undo Magicheader (talk \| contribs) 139 edits No edit summary Next edit →
Line 1: '''Stochastic gradient descent''' (often abbreviated '''SGD''') is an [[iterative method]] for [[Mathematical optimization\|optimizing]] an [[objective function]] with suitable [[smoothness]] properties (e.g. [[Differentiable function\|differentiable]] or [[Subgradient method\|subdifferentiable]]). It can be regarded as a [[stochastic approximation]] of [[gradient descent]] optimization, since it replaces the actual gradient (calculated from the entire [[data set]]) by an estimate thereof (calculated from a randomly selected subset of the data).<ref>{{cite book \|first=Léon \|last=Bottou \|authorlink=Léon Bottou \|first2=Olivier \|last2=Bousquet \|chapter=The Tradeoffs of Large Scale Learning \|title=Optimization for Machine Learning \|editor-first=Suvrit \|editor-last=Sra \|editor2-first=Sebastian \|editor2-last=Nowozin \|editor3-first=Stephen J. \|editor3-last=Wright \|___location=Cambridge \|publisher=MIT Press \|year=2012 \|isbn=978-0-262-01646-9 \|pages=351–368 \|chapterurl=https://books.google.com/books?id=JPQx7s2L1A8C&pg=PA351 }}</ref> While the basic idea behind stochastic approximation can be traced back to the [[Robbins–Monro algorithm]] of the 1950s,<ref name="rm">{{~~cite~~Cite journal \| ~~last~~last1 = ~~Mei~~Robbins \| ~~first~~first1 = ~~Song~~H. \| ~~title~~authorlink = AHerbert ~~mean~~Robbins\| ~~field~~last2 ~~view~~= ofMonro ~~the~~\| ~~landscape~~first2 of= ~~two-layer neural networks~~S. \| ~~journal~~doi = ~~Proceedings~~10.1214/aoms/1177729586 of\| ~~the~~title ~~National~~= ~~Academy~~A ofStochastic ~~Sciences~~Approximation Method \| ~~volume~~journal = The ~~115\|~~Annals ~~issue~~of =Mathematical Statistics 33\| ~~year~~volume = ~~2018~~22 \| ~~pages~~issue = 3 ~~E7665–E7671~~\| ~~jstor~~pages = 400 \| ~~doi~~year = ~~10.1073/pnas.1806579115~~1951 \| pmid = ~~30054315~~ \| pmc = ~~6099898~~ }}</ref>, stochastic gradient descent has become an important optimization method in [[machine learning]].<ref name="~~Taddy2019~~tz">{{~~cite~~Cite ~~book~~journal \|~~first~~ last =~~Matt~~ Zhang \|~~last~~ first =~~Taddy~~ Tong \|~~title~~ doi =~~Business~~ ~~Data~~10.1145/1015330.1015332 ~~Science:~~\| ~~Combining~~title ~~Machine~~= ~~Learning~~Solving ~~and~~large ~~Economics~~scale tolinear ~~Optimize,~~prediction ~~Automate,~~problems ~~and~~using ~~Accelerate~~stochastic ~~Business~~gradient ~~Decisions~~descent algorithms \|~~chapter~~ journal =~~Stochastic~~ ~~Gradient~~Proceedings ~~Descent~~of the 21st international conference on machine learning (ICML'04) \| pages~~=303–307~~ ~~\|___location~~=~~New~~ ~~York~~116 \|~~publisher=McGraw-Hill~~ \|year =~~2019~~ 2004 \|~~isbn~~ pmid =~~978-1-260-45277-8~~ \|~~chapterurl=https://books.google.com/books?id=yPOUDwAAQBAJ&pg~~ pmc =~~PA303~~ }}</ref> ==Background==

Stochastic gradient descent: Difference between revisions