Stochastic gradient descent: Difference between revisions

Content deleted Content added
Line 137:
In such settings, ISGD is simply implemented as follows. Let <math>f(\xi) = \eta q(x_i'w^{old} + \xi ||x_i||^2)</math>, where <math>\xi</math> is scalar.
Then, ISGD is equivalent to:
:<math>w^{new} = w^{old} + \xi^\ast x_i,~\quadtext{where}~\xi^\ast = f(\xi^\ast).</math>
 
The scaling factor <math>\xi^\ast\in\mathbb{R}</math> can be found through the [[bisection method]] since
in most regular models, such as the aforementioned generalized linear models, function <math>q()</math> is decreasing,
and thus the search bounds for <math>\xi^\ast</math> are