Revision as of 23:50, 25 November 2019 edit Magicheader (talk \| contribs) 139 edits →Implicit updates (ISGD) ← Previous edit		Revision as of 07:58, 30 November 2019 edit undo Magicheader (talk \| contribs) 139 edits →Implicit updates (ISGD) Next edit →
Line 137: In such settings, ISGD is simply implemented as follows. Let <math>f(\xi) = \eta q(x_i'w^{old} + \xi \|\|x_i\|\|^2)</math>, where <math>\xi</math> is scalar. Then, ISGD is equivalent to: :<math>w^{new} = w^{old} + \xi^\ast x_i,~\~~quad~~text{where}~\xi^\ast = f(\xi^\ast).</math> The scaling factor <math>\xi^\ast\in\mathbb{R}</math> can be found through the [[bisection method]] since in most regular models, such as the aforementioned generalized linear models, function <math>q()</math> is decreasing, and thus the search bounds for <math>\xi^\ast</math> are

Stochastic gradient descent: Difference between revisions