Revision as of 02:20, 14 October 2024 edit Cosmia Nebula (talk \| contribs) Extended confirmed users 11,304 edits →Local response normalization: global Tag: Visual edit ← Previous edit		Revision as of 19:22, 18 October 2024 edit undo Cosmia Nebula (talk \| contribs) Extended confirmed users 11,304 edits →Miscellaneous: QK norm Tag: Visual edit Next edit →
Line 213: == Miscellaneous == '''Gradient normalization''' ('''GradNorm''')<ref>{{Cite journal \|last1=Chen \|first1=Zhao \|last2=Badrinarayanan \|first2=Vijay \|last3=Lee \|first3=Chen-Yu \|last4=Rabinovich \|first4=Andrew \|date=2018-07-03 \|title=GradNorm: Gradient Normalization for Adaptive Loss Balancing in Deep Multitask Networks \|url=https://proceedings.mlr.press/v80/chen18a.html \|journal=Proceedings of the 35th International Conference on Machine Learning \|language=en \|publisher=PMLR \|pages=794–803 \|arxiv=1711.02257}}</ref> normalizes gradient vectors during backpropagation. '''Query-Key normalization''' ('''QK-Norm''')<ref>{{Cite journal \|last=Henry \|first=Alex \|last2=Dachapally \|first2=Prudhvi Raj \|last3=Pawar \|first3=Shubham Shantaram \|last4=Chen \|first4=Yuxuan \|date=2020-11 \|editor-last=Cohn \|editor-first=Trevor \|editor2-last=He \|editor2-first=Yulan \|editor3-last=Liu \|editor3-first=Yang \|title=Query-Key Normalization for Transformers \|url=https://aclanthology.org/2020.findings-emnlp.379/ \|journal=Findings of the Association for Computational Linguistics: EMNLP 2020 \|___location=Online \|publisher=Association for Computational Linguistics \|pages=4246–4253 \|doi=10.18653/v1/2020.findings-emnlp.379}}</ref> is designed for [[Transformer (deep learning architecture)\|Transformers]]. == See also ==

Normalization (machine learning): Difference between revisions