Content deleted Content added
No edit summary |
|||
Line 138:
'''Root mean square layer normalization''' ('''RMSNorm''')<ref>{{Citation |last=Zhang |first=Biao |title=Root Mean Square Layer Normalization |date=2019-10-16 |url=http://arxiv.org/abs/1910.07467 |access-date=2024-08-07 |doi=10.48550/arXiv.1910.07467 |last2=Sennrich |first2=Rico}}</ref> changes LayerNorm by<math display="block">
\hat{x_i} = \frac{x_i}{\sqrt{\frac 1D \sum_{i=1}^D x_i^2}}, \quad y_i = \gamma \hat{x_i} + \beta
</math>Essentially it is LayerNorm where we enforce <math>\mu, \epsilon = 0</math>.
== Other normalizations ==
|