Content deleted Content added
Magicheader (talk | contribs) |
Magicheader (talk | contribs) |
||
Line 135:
<math>q() \in\mathbb{R}</math> may depend on <math>x_i, y_i</math> as well but not on <math>w</math> except through <math>x_i'w</math>. Least squares obeys this rule, and so does [[logistic regression]], and most [[generalized linear model]]s. For instance, in least squares, <math>q(x_i'w) = y_i - x_i'w</math>, and in logistic regression <math>q(x_i'w) = y_i - S(x_i'w)</math>, where <math>S(u) = e^u/(1+e^u)</math> is the [[logistic function]]. In [[Poisson regression]], <math>q(x_i'w) = y_i - e^{x_i'w}</math>, and so on.
In such settings, ISGD is simply implemented as follows
Then, ISGD is equivalent to:
:<math>w^{new} = w^{old} + \xi^\ast x_i,\quad\xi^\ast =
The scaling factor <math>\xi\in\mathbb{R}</math> can be found through the [[bisection method]] since
in most regular models, such as the aforementioned generalized linear models, function <math>q()</math> is decreasing,
and thus the search bounds for <math>\xi^\ast</math> are
<math>[\min(0,
===Momentum===
|