Stochastic gradient descent: Difference between revisions

Content deleted Content added
No edit summary
Line 121:
consider least squares with features <math>x_1, \ldots, x_n \in\mathbb{R}^p</math> and observations
<math>y_1, \ldots, y_n\in\mathbb{R}</math>. We wish to solve:
:<math>\min_w \sum_{j=1}^n (y_j - x_j'w)^2.,</math>
where <math>x_j' w = x_{j1} w_1 + x_{j, 2} w_2 + ... + x_{j,p} w_p</math> indicates the inner product.
Note that <math>x</math> could have "1" as the first element to include an intercept. Classical stochastic gradient descent proceeds as follows:
:<math>w^{new} = w^{old} + \eta (y_i - x_i'w^{old}) x_i</math>