Revision as of 17:08, 15 May 2020 edit Michael Hardy (talk \| contribs) Administrators 210,577 edits →Convergence of the Algorithm ← Previous edit		Revision as of 17:09, 15 May 2020 edit undo Michael Hardy (talk \| contribs) Administrators 210,577 edits →Convergence of the algorithm Next edit →
Line 78: C3) <math>\sum_{n=0}^{\infty}\varepsilon_n^2 <\infty </math> C4) <math>\|X_n\| \leq B, \text{ for a fixed bound }\; B. </math> C5) <math>g(\theta)\; \text{ is strictly convex, i.e.} </math> : <math display="block">\inf_{\delta\leq \|\theta - \theta^\|\leq 1/\delta}\langle\theta-\theta^, \nabla g(\theta)\rangle > 0,\text{ for every }\; 0< \delta < 1. </math> Line 92: </math> is too far away from <math>\theta^* </math>. As for C3) note that if <math>\theta_n </math> converges to <math>\theta^* </math> then <math display="block">\theta_{n+1} - \theta_n = -\varepsilon_n H(\theta_n, X_{n+1}) \rightarrow 0, \text{ as }\; n\rightarrow \infty.</math> so we must have <math>\varepsilon_n \downarrow 0 </math> ，and the condition C3) ensures it. A natural choice would be <math>\varepsilon_n = 1/n </math>. Condition C5) is a fairly stringent condition on the shape of <math>g(\theta)</math>; it gives the search direction of the algorithm. ==== Example (where the stochastic gradient method is appropriate)<ref name="jcsbook" /> ====

Stochastic approximation: Difference between revisions