Stochastic gradient descent: Difference between revisions

Content deleted Content added
No edit summary
No edit summary
Line 1:
'''Stochastic gradient descent''' (often abbreviated '''SGD''') is an [[iterative method]] for [[Mathematical optimization|optimizing]] an [[objective function]] with suitable [[smoothness]] properties (e.g. [[Differentiable function|differentiable]] or [[Subgradient method|subdifferentiable]]). It can be regarded as a [[stochastic approximation]] of [[gradient descent]] optimization, since it replaces the actual gradient (calculated from the entire [[data set]]) by an estimate thereof (calculated from a randomly selected subset of the data).<ref>{{cite book |first=Léon |last=Bottou |authorlink=Léon Bottou |first2=Olivier |last2=Bousquet |chapter=The Tradeoffs of Large Scale Learning |title=Optimization for Machine Learning |editor-first=Suvrit |editor-last=Sra |editor2-first=Sebastian |editor2-last=Nowozin |editor3-first=Stephen J. |editor3-last=Wright |___location=Cambridge |publisher=MIT Press |year=2012 |isbn=978-0-262-01646-9 |pages=351–368 |chapterurl=https://books.google.com/books?id=JPQx7s2L1A8C&pg=PA351 }}</ref>
 
While the basic idea behind stochastic approximation can be traced back to the [[Robbins–Monro algorithm]] of the 1950s,<ref name="rm">{{citeCite journal | lastlast1 = MeiRobbins | firstfirst1 = SongH. | titleauthorlink = AHerbert meanRobbins| fieldlast2 view= ofMonro the| landscapefirst2 of= two-layer neural networksS. | journaldoi = Proceedings10.1214/aoms/1177729586 of| thetitle National= AcademyA ofStochastic SciencesApproximation Method | volumejournal = The 115|Annals issueof =Mathematical Statistics 33| yearvolume = 201822 | pagesissue = 3 E7665–E7671| jstorpages = 400 | doiyear = 10.1073/pnas.18065791151951 | pmid = 30054315 | pmc = 6099898 }}</ref>, stochastic gradient descent has become an important optimization method in [[machine learning]].<ref name="Taddy2019tz">{{citeCite bookjournal |first last =Matt Zhang |last first =Taddy Tong |title doi =Business Data10.1145/1015330.1015332 Science:| Combiningtitle Machine= LearningSolving andlarge Economicsscale tolinear Optimize,prediction Automate,problems andusing Acceleratestochastic Businessgradient Decisionsdescent algorithms |chapter journal =Stochastic GradientProceedings Descentof the 21st international conference on machine learning (ICML'04) | pages=303–307 |___location=New York116 |publisher=McGraw-Hill |year =2019 2004 |isbn pmid =978-1-260-45277-8 |chapterurl=https://books.google.com/books?id=yPOUDwAAQBAJ&pg pmc =PA303 }}</ref>
 
 
 
==Background==