Stochastic approximation

Stochastic approximation methods are a family of iterative stochastic optimization algorithms that attempt to find zeroes or extrema of functions which cannot be computed directly, but only estimated via noisy observations. Mathematically, this refers to solving:

\min _{x\in \Theta }\;f(x)=\mathbb {E} [F(x,\xi )]

where the objective is to find the parameter $x\in \Theta$ , which minimizes the above function for some unknown random variable, $\xi$ . It is assumed that while the ___domain $\Theta \subset \mathbb {R} ^{d}$ is known, where $d$ is the dimension of the parameter $x$ , $f(x)$ cannot be computed exactly, but instead approximated via simulation.

The first, and prototypical, algorithms of this kind were the Robbins-Monro and Kiefer-Wolfowitz algorithms.

Robbins-Monro algorithm

The Robbins-Monro algorithm, introduced in 1951^[1], presented a methodology for solving a root finding problem, where the function is represented as an expected value. Let us assume that we have a function $M(x)$ , and a constant $\alpha$ , such that the equation $M(x)=\alpha$ has a unique root at $x=\theta$ . It is assumed that while we cannot directly observe the function $M(x)$ , we can instead obtain measurements of the random variable $N(x)$ where $\mathbb {E} [N(x)]=M(x)$ . The structure of the algorithm is to then generate iterates of the form

x_{n+1}=x_{n}+a_{n}(\alpha -N(x_{n}))

.

Here, $a_{1},a_{2},\dots$ is a sequence of positive step-sizes. Robbins and Monro proved ^[1]^{, Theorem 2} that $x_{n}$ converges in $L^{2}$ (and hence also in probability) to $\theta$ provided that:

$N(x)$ is uniformly bounded,
$M(x)$ is nondecreasing,
$M'(x_{0})$ exists and is positive, and
$a_{n}$ satisfies the following requirements:

x_{n+1}=x_{n}+a_{n}(\alpha -N(x_{n}))

The last condition is fulfilled for example by taking $a_{n}=1/n$ ; other series are possible but in order to average out the noise in $N(x)$ , $a_{n}$ must converge slowly.

Kiefer-Wolfowitz algorithm

In the Kiefer-Wolfowitz algorithm^[2], introduced a year after the Robbins-Monro algorithm, one wishes to find the maximum, $x_{0}$ , of the unknown $M(x)$ and constructs a sequence $x_{1},x_{2},\dots$ such that

x_{n+1}=x_{n}+a_{n}{\frac {N(x_{n}+c_{n})-N(x_{n}-c_{n})}{c_{n}}}

.

Here, $a_{1},a_{2},\dots$ is a sequence of positive step sizes which serve the same function as in the Robbins-Monro algorithm, and $c_{1},c_{2},\dots$ is a sequence of positive step sizes which are used to estimate, via finite differences, the derivative of $M$ . Kiefer and Wolfowitz showed that, if $a_{n}$ and $c_{n}$ satisfy various bounds (fulfilled by taking $a_{n}=1/n$ , $c_{n}=(1/n)^{1/3}$ ), and $M(x)$ and $N(x)$ satisfy some technical conditions, then the sequence $x_{n}$ converges in probability to $x_{0}$ .

Subsequent developments

An extensive theoretical literature has grown up around these algorithms, concerning conditions for convergence, rates of convergence, multivariate and other generalizations, proper choice of step size, possible noise models, and so on.^[3]^[4] These methods are also applied in control theory, in which case the unknown function which we wish to optimize or find the zero of may vary in time. In this case, the step size $a_{n}$ should not converge to zero but should be chosen so as to track the function.^[3]^{, 2nd ed., chapter 3}

C. Johan Masreliez and R. Douglas Martin were the first to use stochastic approximation in 1975 when dealing with robust estimation.^[5]

References

^ ^a ^b A Stochastic Approximation Method, Herbert Robbins and Sutton Monro, Annals of Mathematical Statistics 22, #3 (September 1951), pp. 400–407.
^ Stochastic Estimation of the Maximum of a Regression Function, J. Kiefer and J. Wolfowitz, Annals of Mathematical Statistics 23, #3 (September 1952), pp. 462–466.
^ ^a ^b Stochastic Approximation Algorithms and Applications, Harold J. Kushner and G. George Yin, New York: Springer-Verlag, 1997. ISBN 038794916X; 2nd ed., titled Stochastic Approximation and Recursive Algorithms and Applications, 2003, ISBN 0387008942.
^ Stochastic Approximation and Recursive Estimation, Mikhail Borisovich Nevel'son and Rafail Zalmanovich Has'minskiĭ, translated by Israel Program for Scientific Translations and B. Silver, Providence, RI: American Mathematical Society, 1973, 1976. ISBN 0821815970.
^ R.D. Martin & C.J. Masreliez, Robust estimation via stochastic approximation. IEEE Trans. Inform. Theory, 21(pp.263—271) (1975).

[rm-1] A Stochastic Approximation Method, Herbert Robbins and Sutton Monro, Annals of Mathematical Statistics 22, #3 (September 1951), pp. 400–407.

[2] Stochastic Estimation of the Maximum of a Regression Function, J. Kiefer and J. Wolfowitz, Annals of Mathematical Statistics 23, #3 (September 1952), pp. 462–466.

[kushneryin-3] Stochastic Approximation Algorithms and Applications, Harold J. Kushner and G. George Yin, New York: Springer-Verlag, 1997. ISBN 038794916X; 2nd ed., titled Stochastic Approximation and Recursive Algorithms and Applications, 2003, ISBN 0387008942.

[4] Stochastic Approximation and Recursive Estimation, Mikhail Borisovich Nevel'son and Rafail Zalmanovich Has'minskiĭ, translated by Israel Program for Scientific Translations and B. Silver, Providence, RI: American Mathematical Society, 1973, 1976. ISBN 0821815970.

[5] R.D. Martin & C.J. Masreliez, Robust estimation via stochastic approximation. IEEE Trans. Inform. Theory, 21(pp.263—271) (1975).

[1]

[2]

[3]

[4]

[5]

Stochastic approximation

Contents

Robbins-Monro algorithm

Kiefer-Wolfowitz algorithm

Subsequent developments

See also

References