Stochastic gradient descent: Difference between revisions

Content deleted Content added
Line 126:
 
Even though a closed-form solution for ISGD is only possible in least squares, the procedure can be efficiently implemented in a wide range of models. Specifically, suppose that <math>Q_i(w)</math> depends on <math>w</math> only through a linear combination with features <math>x_i</math>, so that we can write <math>\nabla_w Q_i(w) = -q(x_i'w) x_i</math>, where
<math>q() \in\mathbb{R}</math> may depend on <math>x_i, y_i</math> as well but not on <math>w</math> except through <math>x_i'w</math>. Least squares obeys this rule, and so does [[Logistic regression | logistic regression]], and most [[Generalized linear model| generalized linear models]]. For instance, in least squares, <math>q(x_i'w) = y_i - x_i'w</math>, and in logistic regression <math>q(x_i'w) = y_i - S(x_i'w)</math>, where <math>S(u) = e^u/(1+e^u)</math> is the [[logistic function]]. In [[Poisson regression]], <math>q(x_i'w) = y_i - e^{x_i'w}</math>, and so on.
 
In such settings, ISGD is simply implemented as follows: