Revision as of 20:56, 17 May 2025 edit Citation bot (talk \| contribs) Bots 5,868,607 edits Altered template type. Add: class, date, title, eprint, authors 1-7. Changed bare reference to CS1/2. Removed parameters. Some additions/deletions were parameter name changes. \| Use this bot. Report bugs. \| Suggested by Headbomb \| #UCB_toolbar ← Previous edit		Revision as of 15:56, 26 May 2025 edit undo OAbot (talk \| contribs) Bots 643,717 edits m Open access bot: url-access updated in citation with #oabot. Next edit →
Line 78: <math>\gamma</math> and <math>\beta</math> allow the network to learn to undo the normalization, if this is beneficial.<ref name=":1">{{Cite book \|last1=Goodfellow \|first1=Ian \|title=Deep learning \|last2=Bengio \|first2=Yoshua \|last3=Courville \|first3=Aaron \|date=2016 \|publisher=The MIT Press \|isbn=978-0-262-03561-3 \|series=Adaptive computation and machine learning \|___location=Cambridge, Massachusetts \|chapter=8.7.1. Batch Normalization}}</ref> BatchNorm can be interpreted as removing the purely linear transformations, so that its layers focus solely on modelling the nonlinear aspects of data, which may be beneficial, as a neural network can always be augmented with a linear transformation layer on top.<ref>{{Cite journal \|last1=Desjardins \|first1=Guillaume \|last2=Simonyan \|first2=Karen \|last3=Pascanu \|first3=Razvan \|last4=kavukcuoglu \|first4=koray \|date=2015 \|title=Natural Neural Networks \|url=https://proceedings.neurips.cc/paper_files/paper/2015/hash/2de5d16682c3c35007e4e92982f1a2ba-Abstract.html \|journal=Advances in Neural Information Processing Systems \|publisher=Curran Associates, Inc. \|volume=28}}</ref><ref name=":1" /> It is claimed in the original publication that BatchNorm works by reducing internal covariance shift, though the claim has both supporters<ref>{{Cite journal \|last1=Xu \|first1=Jingjing \|last2=Sun \|first2=Xu \|last3=Zhang \|first3=Zhiyuan \|last4=Zhao \|first4=Guangxiang \|last5=Lin \|first5=Junyang \|date=2019 \|title=Understanding and Improving Layer Normalization \|url=https://proceedings.neurips.cc/paper/2019/hash/2f4fe03d77724a7217006e5d16728874-Abstract.html \|journal=Advances in Neural Information Processing Systems \|publisher=Curran Associates, Inc. \|volume=32 \|arxiv=1911.07013}}</ref><ref>{{Cite journal \|last1=Awais \|first1=Muhammad \|last2=Bin Iqbal \|first2=Md. Tauhid \|last3=Bae \|first3=Sung-Ho \|date=November 2021 \|title=Revisiting Internal Covariate Shift for Batch Normalization \|url=https://ieeexplore.ieee.org/document/9238401 \|journal=IEEE Transactions on Neural Networks and Learning Systems \|volume=32 \|issue=11 \|pages=5082–5092 \|doi=10.1109/TNNLS.2020.3026784 \|issn=2162-237X \|pmid=33095717\|url-access=subscription }}</ref> and detractors.<ref>{{Cite journal \|last1=Bjorck \|first1=Nils \|last2=Gomes \|first2=Carla P \|last3=Selman \|first3=Bart \|last4=Weinberger \|first4=Kilian Q \|date=2018 \|title=Understanding Batch Normalization \|url=https://proceedings.neurips.cc/paper/2018/hash/36072923bfc3cf47745d704feb489480-Abstract.html \|journal=Advances in Neural Information Processing Systems \|publisher=Curran Associates, Inc. \|volume=31 \|arxiv=1806.02375}}</ref><ref>{{Cite journal \|last1=Santurkar \|first1=Shibani \|last2=Tsipras \|first2=Dimitris \|last3=Ilyas \|first3=Andrew \|last4=Madry \|first4=Aleksander \|date=2018 \|title=How Does Batch Normalization Help Optimization? \|url=https://proceedings.neurips.cc/paper/2018/hash/905056c1ac1dad141560467e0a99e1cf-Abstract.html \|journal=Advances in Neural Information Processing Systems \|publisher=Curran Associates, Inc. \|volume=31}}</ref> === Special cases ===

Normalization (machine learning): Difference between revisions