Revision as of 17:09, 15 May 2020 edit Michael Hardy (talk \| contribs) Administrators 210,577 edits →Convergence of the algorithm ← Previous edit		Revision as of 17:10, 15 May 2020 edit undo Michael Hardy (talk \| contribs) Administrators 210,577 edits →Kiefer–Wolfowitz algorithm Next edit →
Line 102: where <math>N(x_n+c_n)</math> and <math>N(x_n-c_n)</math> are independent, and the gradient of <math>M(x)</math> is approximated using finite differences. The sequence <math>\{c_n\}</math> specifies the sequence of finite difference widths used for the gradient approximation, while the sequence <math>\{a_n\}</math> specifies a sequence of positive step sizes taken along that direction. Kiefer and Wolfowitz proved that, if <math>M(x)</math> satisfied certain regularity conditions, then <math>x_n</math> will converge to <math>\theta</math> in probability as <math>n\to\infty </math>, and later Blum<ref name=":0" /> in 1954 showed <math>x_n</math> converges to <math>\theta</math> almost surely, provided that: * <math>\operatorname{Var}(N(x))\le S<\infty</math> for all <math>x</math>. * The function <math>M(x)</math> has a unique point of maximum (minimum) and is strong concave (convex) ** The algorithm was first presented with the requirement that the function <math>M(\cdot)</math> maintains strong global convexity (concavity) over the entire feasible space. Given this condition is too restrictive to impose over the entire ___domain, Kiefer and Wolfowitz proposed that it is sufficient to impose the condition over a compact set <math>C_0 \subset \mathbb R^d</math> which is known to include the optimal solution.

Stochastic approximation: Difference between revisions