Normalization (machine learning): Difference between revisions

Content deleted Content added
Line 136:
 
return y
</syntaxhighlight>For multilayered [[Recurrent neural network|recurrent neural networks]] (RNN), BatchNorm is usually applied ''sequence-wise''only to half of the inputs.<ref>{{Cite journal |last=Laurent |first=Cesar |last2=Pereyra |first2=Gabriel |last3=Brakel |first3=Philemon |last4=Zhang |first4=Ying |last5=Bengio |first5=Yoshua |date=2016-03 |title=Batch normalized recurrent neural networks |url=http://ieeexplore.ieee.org/document/7472159/ |publisher=IEEE |pages=2657–2661 |doi=10.1109/ICASSP.2016.7472159 |isbn=978-1-4799-9988-0}}</ref> Let the hidden state of the <math>l</math>-th layer at time <math>t</math> be <math>h_t^l</math>. The standard RNN, without normalization, satisfies<math display="block">h^l_t = \phi(W^l h_t^{l-1} + U^l h_{t-1}^{l} + b^l) </math>where <math>W^l, U^l, b^l</math> are weights and biases, and <math>\phi</math> is the activation function. In sequence-wiseApplying BatchNorm, this becomes<math display="block">h^l_t = \phi(\mathrm{BN}(W^l h_t^{l-1}) + U^l h_{t-1}^{l}) </math>There are two possible ways to define what a "batch" is in BatchNorm for RNNs: ''frame-wise'' and ''sequence-wise''. Concretely, consider applying an RNN to process a batch of sentences. Let <math>h_{b, t}^l</math> be the hidden state of the <math>l</math>-th layer for the <math>t</math>-th token of the <math>b</math>-th input sentence. Then frame-wise BatchNorm means normalizing over <math>b</math>:<math display="block">
\begin{aligned}
\mu_t^l &= \frac{1}{B} \sum_{b=1}^B h_{i,t}^l \\
(\sigma_t^l)^2 &= \frac{1}{B} \sum_{b=1}^B (h_t^l - \mu_t^l)^2
\end{aligned}
</math>and sequence-wise means normalizing over <math>(b, t)</math>:<math display="block">
\begin{aligned}
\mu^l &= \frac{1}{BT} \sum_{b=1}^B\sum_{t=1}^T h_{i,t}^l \\
(\sigma^l)^2 &= \frac{1}{BT} \sum_{b=1}^B\sum_{t=1}^T (h_t^l - \mu^l)^2
\end{aligned}
</math>It is also possible to apply BatchNorm to [[Long short-term memory|LSTMs]].<ref>{{Citation |last=Cooijmans |first=Tim |title=Recurrent Batch Normalization |date=2017-02-28 |url=https://arxiv.org/abs/1603.09025 |publisher=arXiv |doi=10.48550/arXiv.1603.09025 |id=arXiv:1603.09025 |last2=Ballas |first2=Nicolas |last3=Laurent |first3=César |last4=Gülçehre |first4=Çağlar |last5=Courville |first5=Aaron}}</ref>
 
=== Improvements ===