Content deleted Content added
Citation bot (talk | contribs) Altered template type. Add: class, date, title, eprint, authors 1-7. Changed bare reference to CS1/2. Removed parameters. Some additions/deletions were parameter name changes. | Use this bot. Report bugs. | Suggested by Headbomb | #UCB_toolbar |
m Open access bot: url-access updated in citation with #oabot. |
||
Line 78:
<math>\gamma</math> and <math>\beta</math> allow the network to learn to undo the normalization, if this is beneficial.<ref name=":1">{{Cite book |last1=Goodfellow |first1=Ian |title=Deep learning |last2=Bengio |first2=Yoshua |last3=Courville |first3=Aaron |date=2016 |publisher=The MIT Press |isbn=978-0-262-03561-3 |series=Adaptive computation and machine learning |___location=Cambridge, Massachusetts |chapter=8.7.1. Batch Normalization}}</ref> BatchNorm can be interpreted as removing the purely linear transformations, so that its layers focus solely on modelling the nonlinear aspects of data, which may be beneficial, as a neural network can always be augmented with a linear transformation layer on top.<ref>{{Cite journal |last1=Desjardins |first1=Guillaume |last2=Simonyan |first2=Karen |last3=Pascanu |first3=Razvan |last4=kavukcuoglu |first4=koray |date=2015 |title=Natural Neural Networks |url=https://proceedings.neurips.cc/paper_files/paper/2015/hash/2de5d16682c3c35007e4e92982f1a2ba-Abstract.html |journal=Advances in Neural Information Processing Systems |publisher=Curran Associates, Inc. |volume=28}}</ref><ref name=":1" />
It is claimed in the original publication that BatchNorm works by reducing internal covariance shift, though the claim has both supporters<ref>{{Cite journal |last1=Xu |first1=Jingjing |last2=Sun |first2=Xu |last3=Zhang |first3=Zhiyuan |last4=Zhao |first4=Guangxiang |last5=Lin |first5=Junyang |date=2019 |title=Understanding and Improving Layer Normalization |url=https://proceedings.neurips.cc/paper/2019/hash/2f4fe03d77724a7217006e5d16728874-Abstract.html |journal=Advances in Neural Information Processing Systems |publisher=Curran Associates, Inc. |volume=32 |arxiv=1911.07013}}</ref><ref>{{Cite journal |last1=Awais |first1=Muhammad |last2=Bin Iqbal |first2=Md. Tauhid |last3=Bae |first3=Sung-Ho |date=November 2021 |title=Revisiting Internal Covariate Shift for Batch Normalization |url=https://ieeexplore.ieee.org/document/9238401 |journal=IEEE Transactions on Neural Networks and Learning Systems |volume=32 |issue=11 |pages=5082–5092 |doi=10.1109/TNNLS.2020.3026784 |issn=2162-237X |pmid=33095717|url-access=subscription }}</ref> and detractors.<ref>{{Cite journal |last1=Bjorck |first1=Nils |last2=Gomes |first2=Carla P |last3=Selman |first3=Bart |last4=Weinberger |first4=Kilian Q |date=2018 |title=Understanding Batch Normalization |url=https://proceedings.neurips.cc/paper/2018/hash/36072923bfc3cf47745d704feb489480-Abstract.html |journal=Advances in Neural Information Processing Systems |publisher=Curran Associates, Inc. |volume=31 |arxiv=1806.02375}}</ref><ref>{{Cite journal |last1=Santurkar |first1=Shibani |last2=Tsipras |first2=Dimitris |last3=Ilyas |first3=Andrew |last4=Madry |first4=Aleksander |date=2018 |title=How Does Batch Normalization Help Optimization? |url=https://proceedings.neurips.cc/paper/2018/hash/905056c1ac1dad141560467e0a99e1cf-Abstract.html |journal=Advances in Neural Information Processing Systems |publisher=Curran Associates, Inc. |volume=31}}</ref>
=== Special cases ===
|