Revision as of 00:06, 20 August 2024 edit Cosmia Nebula (talk \| contribs) Extended confirmed users 11,304 edits →Local response normalization: hyperparm Tag: Visual edit ← Previous edit		Revision as of 01:26, 8 September 2024 edit undo Cosmia Nebula (talk \| contribs) Extended confirmed users 11,304 edits →Other normalizations: adaLN Tag: Visual edit Next edit →
Line 144: '''Gradient normalization''' ('''GradNorm''')<ref>{{Cite journal \|last=Chen \|first=Zhao \|last2=Badrinarayanan \|first2=Vijay \|last3=Lee \|first3=Chen-Yu \|last4=Rabinovich \|first4=Andrew \|date=2018-07-03 \|title=GradNorm: Gradient Normalization for Adaptive Loss Balancing in Deep Multitask Networks \|url=https://proceedings.mlr.press/v80/chen18a.html \|journal=Proceedings of the 35th International Conference on Machine Learning \|language=en \|publisher=PMLR \|pages=794–803}}</ref> normalizes gradient vectors during backpropagation. '''Adaptive layer norm''' ('''adaLN''')<ref>{{Cite journal \|last=Peebles \|first=William \|last2=Xie \|first2=Saining \|date=2023 \|title=Scalable Diffusion Models with Transformers \|url=https://openaccess.thecvf.com/content/ICCV2023/html/Peebles_Scalable_Diffusion_Models_with_Transformers_ICCV_2023_paper.html \|language=en \|pages=4195–4205}}</ref> computes the <math>\gamma, \beta</math> in a LayerNorm not from the layer activation itself, but from other data. == CNN-specific normalization ==

Normalization (machine learning): Difference between revisions