Revision as of 20:17, 17 May 2023 edit Vonkje (talk \| contribs) Extended confirmed users 841 edits m link to a term ← Previous edit		Revision as of 01:50, 2 September 2023 edit undo Citation bot (talk \| contribs) Bots 5,866,548 edits Add: publisher. Removed parameters. Some additions/deletions were parameter name changes. \| Use this bot. Report bugs. \| #UCB_CommandLine Next edit →
Line 37: Therefore, the sequence <math display="inline">a_n = n^{-\alpha}</math> with <math display="inline">0 < \alpha < 1</math> satisfies this restriction, but <math display="inline">\alpha = 1</math> does not, hence the longer steps. Under the assumptions outlined in the Robbins–Monro algorithm, the resulting modification will result in the same asymptotically optimal convergence rate <math display="inline">O(1/\sqrt{n})</math> yet with a more robust step size policy.<ref name="pj" /> Prior to this, the idea of using longer steps and averaging the iterates had already been proposed by Nemirovski and Yudin<ref name="NY">On Cezari's convergence of the steepest descent method for approximating saddle points of convex-concave functions, A. Nemirovski and D. Yudin, ''Dokl. Akad. Nauk SSR'' '''2939''', (1978 (Russian)), Soviet Math. Dokl. '''19''' (1978 (English)).</ref> for the cases of solving the stochastic optimization problem with continuous convex objectives and for convex-concave saddle point problems. These algorithms were observed to attain the nonasymptotic rate <math display="inline">O(1/\sqrt{n})</math>. A more general result is given in Chapter 11 of Kushner and Yin<ref>{{Cite book\|url=https://www.springer.com/us/book/9780387008943\|title=Stochastic Approximation and Recursive Algorithms and {{!}} Harold Kushner {{!}} Springer\|~~website~~publisher=www.springer.com\|access-date=2016-05-16\|isbn=9780387008943\|last1=Kushner\|first1=Harold\|last2=George Yin\|first2=G.\|date=2003-07-17}}</ref> by defining interpolated time <math display="inline">t_n=\sum_{i=0}^{n-1}a_i</math>, interpolated process <math display="inline">\theta^n(\cdot)</math> and interpolated normalized process <math display="inline">U^n(\cdot)</math> as <math display="block">\theta^n(t)=\theta_{n+i},\quad U^n(t)=(\theta_{n+i}-\theta^)/\sqrt{a_{n+i}}\quad\mbox{for}\quad t\in[t_{n+i}-t_n,t_{n+i+1}-t_n),i\ge0</math>Let the iterate average be <math>\Theta_n=\frac{a_n}{t}\sum_{i=n}^{n+t/a_n-1}\theta_i</math> and the associate normalized error to be <math>\hat{U}^n(t)=\frac{\sqrt{a_n}}{t}\sum_{i=n}^{n+t/a_n-1}(\theta_i-\theta^)</math>.

Stochastic approximation: Difference between revisions