Normalization (machine learning): Difference between revisions

Content deleted Content added
OAbot (talk | contribs)
m Open access bot: url-access updated in citation with #oabot.
Line 207:
 
=== Root mean square layer normalization ===
'''Root mean square layer normalization''' ('''RMSNorm''')<ref>{{cite arXiv |last1=Zhang |first1=Biao |title=Root Mean Square Layer Normalization |date=2019-10-16 |eprint=1910.07467 |last2=Sennrich |first2=Rico|class=cs.LG }}</ref> changes LayerNorm by:
 
<math display="block">
Line 213:
</math>
 
Essentially, it is LayerNorm where we enforce <math>\mu, \epsilon = 0</math>. It is also called '''L2 normalization'''. It is a special case of '''Lp normalization''', or '''power normalization''':<math display="block">
\hat{x_i} = \frac{x_i}{\left(\frac 1D \sum_{i=1}^D |x_i|^p \right)^{1/p}}, \quad y_i = \gamma \hat{x_i} + \beta
</math>where <math>p > 0</math> is a constant.
 
=== Adaptive ===