Content deleted Content added
→Interpretation: improvement |
|||
Line 108:
\sigma^2 &= (\alpha E[x]^2 + (1 - \alpha) \mu_{x^2, \text{train}}) - \mu^2
\end{aligned}
</math>where <math>\alpha</math> is a hyperparameter to be optimized on a validation set.
== Layer normalization ==
|