Revision as of 02:07, 19 January 2025 edit Citation bot (talk \| contribs) Bots 5,868,353 edits Removed URL that duplicated identifier. Removed access-date with no URL. \| Use this bot. Report bugs. \| Suggested by Dominic3203 \| Category:Deep learning \| #UCB_Category 16/48 ← Previous edit		Revision as of 04:48, 17 May 2025 edit undo Cosmia Nebula (talk \| contribs) Extended confirmed users 11,304 edits →Special cases: RNN Tag: Visual edit Next edit →
Line 136: return y </syntaxhighlight>For multilayered [[Recurrent neural network\|recurrent neural networks]] (RNN), BatchNorm is usually applied ''sequence-wise''.<ref>{{Cite journal \|last=Laurent \|first=Cesar \|last2=Pereyra \|first2=Gabriel \|last3=Brakel \|first3=Philemon \|last4=Zhang \|first4=Ying \|last5=Bengio \|first5=Yoshua \|date=2016-03 \|title=Batch normalized recurrent neural networks \|url=http://ieeexplore.ieee.org/document/7472159/ \|publisher=IEEE \|pages=2657–2661 \|doi=10.1109/ICASSP.2016.7472159 \|isbn=978-1-4799-9988-0}}</ref> Let the hidden state of the <math>l</math>-th layer at time <math>t</math> be <math>h_t^l</math>. The standard RNN, without normalization, satisfies<math display="block">h^l_t = \phi(W^l h_t^{l-1} + U^l h_{t-1}^{l} + b^l) </math>where <math>W^l, U^l, b^l</math> are weights and biases, and <math>\phi</math> is the activation function. In sequence-wise BatchNorm, this becomes<math display="block">h^l_t = \phi(\mathrm{BN}(W^l h_t^{l-1}) + U^l h_{t-1}^{l}) </math> ~~</syntaxhighlight>~~ === Improvements ===

Normalization (machine learning): Difference between revisions