Stochastic approximation: Difference between revisions

Content deleted Content added
Line 102:
where <math>N(x_n+c_n)</math> and <math>N(x_n-c_n)</math> are independent, and the gradient of <math>M(x)</math> is approximated using finite differences. The sequence <math>\{c_n\}</math> specifies the sequence of finite difference widths used for the gradient approximation, while the sequence <math>\{a_n\}</math> specifies a sequence of positive step sizes taken along that direction. Kiefer and Wolfowitz proved that, if <math>M(x)</math> satisfied certain regularity conditions, then <math>x_n</math> will converge to <math>\theta</math> in probability as <math>n\to\infty
</math>, and later Blum<ref name=":0" /> in 1954 showed <math>x_n</math> converges to <math>\theta</math> almost surely, provided that:
* <math>\operatorname{Var}(N(x))\le S<\infty</math> for all <math>x</math>.
* The function <math>M(x)</math> has a unique point of maximum (minimum) and is strong concave (convex)
** The algorithm was first presented with the requirement that the function <math>M(\cdot)</math> maintains strong global convexity (concavity) over the entire feasible space. Given this condition is too restrictive to impose over the entire ___domain, Kiefer and Wolfowitz proposed that it is sufficient to impose the condition over a compact set <math>C_0 \subset \mathbb R^d</math> which is known to include the optimal solution.