Content deleted Content added
Citation bot (talk | contribs) Removed URL that duplicated identifier. Removed access-date with no URL. | Use this bot. Report bugs. | Suggested by Dominic3203 | Category:Deep learning | #UCB_Category 16/48 |
→Special cases: RNN |
||
Line 136:
return y
</syntaxhighlight>For multilayered [[Recurrent neural network|recurrent neural networks]] (RNN), BatchNorm is usually applied ''sequence-wise''.<ref>{{Cite journal |last=Laurent |first=Cesar |last2=Pereyra |first2=Gabriel |last3=Brakel |first3=Philemon |last4=Zhang |first4=Ying |last5=Bengio |first5=Yoshua |date=2016-03 |title=Batch normalized recurrent neural networks |url=http://ieeexplore.ieee.org/document/7472159/ |publisher=IEEE |pages=2657–2661 |doi=10.1109/ICASSP.2016.7472159 |isbn=978-1-4799-9988-0}}</ref> Let the hidden state of the <math>l</math>-th layer at time <math>t</math> be <math>h_t^l</math>. The standard RNN, without normalization, satisfies<math display="block">h^l_t = \phi(W^l h_t^{l-1} + U^l h_{t-1}^{l} + b^l) </math>where <math>W^l, U^l, b^l</math> are weights and biases, and <math>\phi</math> is the activation function. In sequence-wise BatchNorm, this becomes<math display="block">h^l_t = \phi(\mathrm{BN}(W^l h_t^{l-1}) + U^l h_{t-1}^{l}) </math>
=== Improvements ===
|