Revision as of 07:02, 5 September 2024 edit David Eppstein (talk \| contribs) Autopatrolled, Administrators 235,670 edits →Convergence of the algorithm: Nicolas Bouleau ← Previous edit		Revision as of 21:36, 22 November 2024 edit undo Cosmia Nebula (talk \| contribs) Extended confirmed users 11,304 edits →Robbins–Monro algorithm: example with estimating mean Tag: Visual edit Next edit →
Line 13: The Robbins–Monro algorithm, introduced in 1951 by [[Herbert Robbins]] and [[John U. Monro#Personal life\|Sutton Monro]],<ref name="rm">{{Cite journal \| last1 = Robbins \| first1 = H. \| author-link = Herbert Robbins\| last2 = Monro \| first2 = S. \| doi = 10.1214/aoms/1177729586 \| title = A Stochastic Approximation Method \| journal = The Annals of Mathematical Statistics \| volume = 22 \| issue = 3 \| pages = 400 \| year = 1951 \| doi-access = free }}</ref> presented a methodology for solving a root finding problem, where the function is represented as an expected value. Assume that we have a function <math display="inline">M(\theta)</math>, and a constant <math display="inline">\alpha</math>, such that the equation <math display="inline">M(\theta) = \alpha</math> has a unique root at <math display="inline">\theta^</math>. It is assumed that while we cannot directly observe the function <math display="inline">M(\theta)</math>, we can instead obtain measurements of the random variable <math display="inline">N(\theta)</math> where <math display="inline">\operatorname E[N(\theta)] = M(\theta)</math>. The structure of the algorithm is to then generate iterates of the form: ::<math display="block">\theta_{n+1}=\theta_n - a_n(N(\theta_n) - \alpha)</math> Here, <math>a_1, a_2, \dots</math> is a sequence of positive step sizes. [[Herbert Robbins\|Robbins]] and Monro proved<ref name="rm" /><sup>, Theorem 2</sup> that <math>\theta_n</math> [[convergence of random variables\|converges]] in <math>L^2</math> (and hence also in probability) to <math>\theta^</math>, and Blum<ref name=":0">{{Cite journal\|last=Blum\|first=Julius R.\|date=1954-06-01\|title=Approximation Methods which Converge with Probability one\|journal=The Annals of Mathematical Statistics\|language=EN\|volume=25\|issue=2\|pages=382–386\|doi=10.1214/aoms/1177728794\|issn=0003-4851\|doi-access=free}}</ref> later proved the convergence is actually with probability one, provided that: Line 20: * <math display="inline">M'(\theta^)</math> exists and is positive, and The sequence <math display="inline">a_n</math> satisfies the following requirements: :: <math>\qquad \sum^{\infty}_{n=0}a_n = \infty \quad \mbox{ and } \quad \sum^{\infty}_{n=0}a^2_n < \infty \quad </math>▼ ▲:: <math display="block">\qquad \sum^{\infty}_{n=0}a_n = \infty \quad \mbox{ and } \quad \sum^{\infty}_{n=0}a^2_n < \infty \quad </math> A particular sequence of steps which satisfy these conditions, and was suggested by Robbins–Monro, have the form: <math display="inline">a_n=a/n</math>, for <math display="inline"> a > 0 </math>. Other series are possible but in order to average out the noise in <math display="inline">N(\theta)</math>, the above condition must be met. === Example === Consider the problem of estimating the mean <math>\theta^</math> of a probability distribution from a stream of independent samples <math>X_1, X_2, \dots</math>. Let <math>N(\theta) := X - \theta</math>, then the unique solution to <math display="inline">\operatorname E[N(\theta)] = 0</math> is the desired mean <math>\theta^</math>. The RM algorithm gives us<math display="block">\theta_{n+1}=\theta_n - a_n(X_n - \theta_n - \alpha)</math> ===Complexity results=== Line 52 ⟶ 57: === Application in stochastic optimization === Suppose we want to solve the following stochastic optimization problem<math display="block">g(\theta^) = \min_{\theta\in\Theta}\operatorname{E}[Q(\theta,X)],</math>where <math display="inline">g(\theta) = \operatorname{E}[Q(\theta,X)]</math> is differentiable and convex, then this problem is equivalent to find the root <math>\theta^</math> of <math>\nabla g(\theta) = 0</math>. Here <math>Q(\theta,X)</math> can be interpreted as some "observed" cost as a function of the chosen <math>\theta</math> and random effects <math>X</math>. In practice, it might be hard to get an analytical form of <math>\nabla g(\theta)</math>, Robbins–Monro method manages to generate a sequence <math>(\theta_n)_{n\geq 0}</math> to approximate <math>\theta^</math> if one can generate <math>(X_n)_{n\geq 0}▼ ~~Suppose we want to solve the following stochastic optimization problem~~ ▲<math display="block">g(\theta^) = \min_{\theta\in\Theta}\operatorname{E}[Q(\theta,X)],</math>where <math display="inline">g(\theta) = \operatorname{E}[Q(\theta,X)]</math> is differentiable and convex, then this problem is equivalent to find the root <math>\theta^</math> of <math>\nabla g(\theta) = 0</math>. Here <math>Q(\theta,X)</math> can be interpreted as some "observed" cost as a function of the chosen <math>\theta</math> and random effects <math>X</math>. In practice, it might be hard to get an analytical form of <math>\nabla g(\theta)</math>, Robbins–Monro method manages to generate a sequence <math>(\theta_n)_{n\geq 0}</math> to approximate <math>\theta^</math> if one can generate <math>(X_n)_{n\geq 0} </math> , in which the conditional expectation of <math>X_n

Stochastic approximation: Difference between revisions