Revision as of 14:55, 25 November 2023 edit 212.186.132.66 (talk) →Application in stochastic optimization: Link ← Previous edit		Revision as of 21:51, 9 December 2023 edit undo 178.115.69.102 (talk) →Kiefer–Wolfowitz algorithm Next edit →
Line 99: ==Kiefer–Wolfowitz algorithm== The Kiefer–Wolfowitz algorithm was introduced in 1952 by [[Jacob Wolfowitz]] and [[Jack_Kiefer_(statistician)\|Jack Kiefer]],<ref name = "KW">{{Cite journal \| last1 = Kiefer \| first1 = J. \| last2 = Wolfowitz \| first2 = J. \| doi = 10.1214/aoms/1177729392 \| title = Stochastic Estimation of the Maximum of a Regression Function \| journal = The Annals of Mathematical Statistics \| volume = 23 \| issue = 3 \| pages = 462 \| year = 1952 \| doi-access = free }}</ref> and was motivated by the publication of the Robbins–Monro algorithm. However, the algorithm was presented as a method which would stochastically estimate the maximum of a function. Let <math>M(x) </math> be a function which has a maximum at the point <math>\theta </math>. It is assumed that <math>M(x)</math> is unknown; however, certain observations <math>N(x)</math>, where <math>\operatorname E[N(x)] = M(x)</math>, can be made at any point <math>x</math>. The structure of the algorithm follows a gradient-like method, with the iterates being generated as ~~follows:~~ ::<math> x_{n+1} = x_n + a_n \~~bigg~~cdot\left(\frac{N(x_n + c_n) - N(x_n -c_n)}{2 c_n} \~~bigg~~right) </math> where <math>N(x_n+c_n)</math> and <math>N(x_n-c_n)</math> are independent,. ~~and~~At every step, the gradient of <math>M(x)</math> is approximated ~~using~~akin ~~finite~~to ~~differences~~a [[Finite difference#Basic types\|central difference method]] with <math>h=2c_n</math>. ~~The~~So the sequence <math>\{c_n\}</math> specifies the sequence of finite difference widths used for the gradient approximation, while the sequence <math>\{a_n\}</math> specifies a sequence of positive step sizes taken along that direction. Kiefer and Wolfowitz proved that, if <math>M(x)</math> satisfied certain regularity conditions, then <math>x_n</math> will converge to <math>\theta</math> in probability as <math>n\to\infty </math>, and later Blum<ref name=":0" /> in 1954 showed <math>x_n</math> converges to <math>\theta</math> almost surely, provided that: * <math>\operatorname{Var}(N(x))\le S<\infty</math> for all <math>x</math>.

Stochastic approximation: Difference between revisions