Revision as of 19:19, 4 May 2008 edit Spacepotato (talk \| contribs) Autopatrolled, Extended confirmed users, Pending changes reviewers, Rollbackers 27,611 edits spacing, punc; L^2, not L_2. ← Previous edit		Revision as of 19:21, 4 May 2008 edit undo Spacepotato (talk \| contribs) Autopatrolled, Extended confirmed users, Pending changes reviewers, Rollbackers 27,611 edits spacing, header. Next edit →
Line 6: has a function <math>M(x)</math> for which one wishes to find the value of <math>x</math>, <math>x_0</math>, satisfying <math>M(x_0)=\alpha</math>. However, what is observable is not <math>M(x)</math>, but rather a random variable <math>N(x)</math> such that <math>E(N(x)\|x)=M(x)</math>. The algorithm is then to construct a sequence <math>x_1, x_2, \dots</math> which satisfies ::<math>x_{n+1}=x_n+a_n(\alpha-N(x_n))</math>. Here, <math>a_1, a_2, \dots</math> is a sequence of positive step sizes. Robbins and Monro proved that, if <math>N(x)</math> is uniformly bounded, <math>M(x)</math> is nondecreasing, <math>M'(x_0)</math> exists and is positive, and if <math>a_n</math> satisfies a set of bounds (fulfilled if one takes <math>a_n=1/n</math>), then <math>x_n</math> converges in <math>L^2</math> (and hence also in probability) to <math>x_0</math>.<ref name="rm" /><sup>, Theorem 2</sup>. In general, the <math>a_n's</math> need not equal <math>1/n</math>. However, to ensure convergence, they should converge to zero, and in order to average out the noise in <math>N(x)</math>, they should converge slowly. == Kiefer-Wolfowitz algorithm == In the Kiefer-Wolfowitz algorithm <ref>Stochastic Estimation of the Maximum of a Regression Function, [[Jack Kiefer (mathematician)\|J. Kiefer]] and J. Wolfowitz, ''Annals of Mathematical Statistics'' '''23''', #3 (September 1952), pp. 462–466.</ref>, introduced a year after the Robbins-Monro algorithm , one wishes to find the maximum, <math>x_0</math>, of the unknown <math>M(x)</math> Line 12: ::<math>x_{n+1}=x_n+a_n\frac{N(x_n+c_n)-N(x_n-c_n)}{c_n}</math>. Here, <math>a_1, a_2, \dots</math> is a sequence of positive step sizes which serve the same function as in the Robbins-Monro algorithm, and <math>c_1, c_2, \dots</math> is a sequence of positive step sizes which are used to estimate, via finite differences, the derivative of <math>M</math>. [[Jack Kiefer (mathematician)\|Kiefer]] and Wolfowitz showed that, if <math>a_n</math> and <math>c_n</math> satisfy various bounds (fulfilled by taking <math>a_n=1/n</math>, <math>c_n=(1/n)^{1/3}</math>), and <math>M(x)</math> and <math>N(x)</math> satisfy some technical conditions, then the sequence <math>x_n</math> converges in probability to <math>x_0</math>. ==Subsequent developments== An extensive theoretical literature has grown up around these algorithms, concerning conditions for convergence, rates of convergence, multivariate and other generalizations, proper choice of step size, possible noise models, and so on.<ref name="kushneryin">''Stochastic Approximation Algorithms and Applications'', Harold J. Kushner and G. George Yin, New York: Springer-Verlag, 1997. ISBN 038794916X; 2nd ed., titled ''Stochastic Approximation and Recursive Algorithms and Applications'', 2003, ISBN 0387008942.</ref><sup>,</sup><ref>''Stochastic Approximation and Recursive Estimation'', Mikhail Borisovich Nevel'son and Rafail Zalmanovich Has'minskiĭ, translated by Israel Program for Scientific Translations and B. Silver, Providence, RI: American Mathematical Society, 1973, 1976. ISBN 0821815970.</ref> These methods are also applied in [[control theory]], in which case the unknown function which we wish to optimize or find the zero of may vary in time. In this case, the step size <math>a_n</math> should not converge to zero but should be chosen so as to track the function.<ref name="kushneryin" /><sup>, 2nd ed., chapter 3</sup>

Stochastic approximation: Difference between revisions