Stochastic approximation: Difference between revisions

Content deleted Content added
some copy-editing
No edit summary
Line 1:
{{technical|date=June 2012}}
'''Stochastic approximation''' methods are a family of [[Iterative methods|iterative methods]] typically used for [[Root-finding|root-finding]] problems or for [[optimization problem|optimization]] problems. The recursive update rules of stochastic approximation methods can be used, among other things, for solving linear systems when the collected data is corrupted by noise, or for approximating [[Extremum|extreme values]] of functions which cannot be computed directly, but only estimated via noisy observations.
 
For instance in engineering many optimization problems are often of this type when you do not have a mathematical model of the system (which can be too complex) but still would like to optimize its behavior by adjusting certain parameters. For this purpose, you can do experiments or run simulations to evaluate the performance of the system at given values of the parameters. Stochastic approximation algorithms have also been used in the social sciences to describe collective dynamics: fictitious play in learning theory and consensus algorithms can be studied using their theory.<ref>{{cite web|last1=Le Ny|first1=Jerome|title=Introduction to Stochastic Approximation Algorithms|url=http://www.professeurs.polymtl.ca/jerome.le-ny/teaching/DP_fall09/notes/lec11_SA.pdf|website=Polytechnique Montreal|publisher=Teaching Notes|accessdate=16 November 2016}}</ref> These methods are used often in statistics and machine learning that typically need to handle noisy measurements of empirical data. They are also related to [[stochastic optimization]] methods and algorithms.
 
In a nutshell, stochastic approximation algorithms deal with a function of the form <math display="inline"> f(\theta) = \operatorname E_{\xi} [F(\theta,\xi)] </math>
which is the expected value of a function depending on a [[random variable]] <math display="inline">\xi </math>. The goal is to recover properties of such a function <math display="inline">f</math> without evaluating it directly. Instead, stochastic approximation algorithms use random samples of <math display="inline">F(\theta,\xi)</math> to efficiently approximate properties of <math display="inline">f</math> such as zeros or extrema.
 
Recently, stochastic approximations have found extensive applications in the
fields of statistics and machine learning, especially in settings with [[Big data]].
These applications range from [[stochastic optimization]] methods and algorithms,
to online forms of the [[Expectation–maximization algorithm| EM algorithm]], reinforcement learning
via [[Temporal difference learning|temporal differences ]], and [[Deep learning|deep learning]], and others.<ref>{{cite journal |last1=Toulis |first1=Panos |first2=Edoardo |last2=Airoldi|title=Scalable estimation strategies based on stochastic approximations: classical results and new insights |journal=Statistics and Computing |volume=25 |issue=4 |year=2015 |pages=781-795|url=https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4484776/|doi=10.1007/s11222-015-9560-y}}</ref>
For instance in engineering many optimization problems are often of this type when you do not have a mathematical model of the system (which can be too complex) but still would like to optimize its behavior by adjusting certain parameters. For this purpose, you can do experiments or run simulations to evaluate the performance of the system at given values of the parameters. Stochastic approximation algorithms have also been used in the social sciences to describe collective dynamics: fictitious play in learning theory and consensus algorithms can be studied using their theory.<ref>{{cite web|last1=Le Ny|first1=Jerome|title=Introduction to Stochastic Approximation Algorithms|url=http://www.professeurs.polymtl.ca/jerome.le-ny/teaching/DP_fall09/notes/lec11_SA.pdf|website=Polytechnique Montreal|publisher=Teaching Notes|accessdate=16 November 2016}}</ref> These methods are used often in statistics and machine learning that typically need to handle noisy measurements of empirical data. They are also related to [[stochastic optimization]] methods and algorithms.
 
The earliest, and prototypical, algorithms of this kind are the '''Robbins–Monro''' and '''Kiefer–Wolfowitz''' algorithms introduced respectively in 1951 and 1952.