Revision as of 23:11, 1 November 2018 edit Magicheader (talk \| contribs) 139 edits →Extensions and variants ← Previous edit		Revision as of 23:17, 1 November 2018 edit undo Magicheader (talk \| contribs) 139 edits →Implicit updates (ISGD) Next edit →
Line 126: Even though a closed-form solution for ISGD is only possible in least squares, the procedure can be efficiently implemented in a wide range of models. Specifically, suppose that <math>Q_i(w)</math> depends on <math>w</math> only through a linear combination with features <math>x_i</math>, so that we can write <math>\nabla_w Q_i(w) = -q(x_i'w) x_i</math>, where <math>q() \in\mathbb{R}</math> may depend on <math>x_i, y_i</math> as well but not on <math>w</math> except through <math>x_i'w</math>. Least squares obeys this rule, and so does [[Logistic regression \| logistic regression]], and most [[Generalized linear model\| generalized linear models]]. For instance, in least squares, <math>q(x_i'w) = y_i - x_i'w</math>, and in logistic regression <math>q(x_i'w) = y_i - S(x_i'w)</math>, where <math>S(u) = e^u/(1+e^u)</math> is the [[logistic function]]. In [[Poisson regression]], <math>q(x_i'w) = y_i - e^{x_i'w}</math>, and so on. In such settings, ISGD is simply implemented as follows:

Stochastic gradient descent: Difference between revisions