Normalization (machine learning): Difference between revisions

Content deleted Content added
Citation bot (talk | contribs)
Altered template type. Add: class, date, title, eprint, authors 1-4. Removed URL that duplicated identifier. Changed bare reference to CS1/2. | Use this bot. Report bugs. | Suggested by Cosmia Nebula | #UCB_webform
Line 97:
 
=== Improvements ===
BatchNorm has been very popular and there were many attempted improvements. Some examples include:<ref name=":3">https://arxiv.org/pdf/{{cite arXiv | eprint=1906.03548 | last1=Summers | first1=Cecilia | last2=Dinneen | first2=Michael J. | title=Four Things Everyone Should Know to Improve Batch Normalization | date=2019 | class=cs.LG }}</ref>
 
* Ghost batch: Randomly partition a batch into sub-batches and perform BatchNorm separately on each.
Line 110:
</math>where <math>\alpha</math> is a hyperparameter to be optimized on a validation set.
 
Other works attempt to eliminate BatchNorm, such as the Normalizer-Free ResNet.<ref>https://arxiv.org/abs/{{cite arXiv | eprint=2102.06171 | last1=Brock | first1=Andrew | last2=De | first2=Soham | last3=Smith | first3=Samuel L. | last4=Simonyan | first4=Karen | title=High-Performance Large-Scale Image Recognition Without Normalization | date=2021 | class=cs.CV }}</ref>
 
== Layer normalization ==