Content deleted Content added
ce |
Citation bot (talk | contribs) Altered template type. Added eprint. Removed parameters. Some additions/deletions were parameter name changes. | Use this bot. Report bugs. | Suggested by Headbomb | #UCB_toolbar |
||
Line 97:
== Layer normalization ==
'''Layer normalization''' ('''LayerNorm''')<ref name=":2">{{Cite
For a given data input and layer, LayerNorm computes the mean (<math>\mu</math>) and variance (<math>\sigma^2</math>) over all the neurons in the layer. Similar to BatchNorm, learnable parameters <math>\gamma</math> (scale) and <math>\beta</math> (shift) are applied. It is defined by:<math display="block">\hat{x_i} = \frac{x_i - \mu}{\sqrt{\sigma^2 + \epsilon}}, \quad y_i = \gamma_i \hat{x_i} + \beta_i</math>where <math>
|