Normalization (machine learning): Difference between revisions

Content deleted Content added
Line 213:
== Miscellaneous ==
'''Gradient normalization''' ('''GradNorm''')<ref>{{Cite journal |last1=Chen |first1=Zhao |last2=Badrinarayanan |first2=Vijay |last3=Lee |first3=Chen-Yu |last4=Rabinovich |first4=Andrew |date=2018-07-03 |title=GradNorm: Gradient Normalization for Adaptive Loss Balancing in Deep Multitask Networks |url=https://proceedings.mlr.press/v80/chen18a.html |journal=Proceedings of the 35th International Conference on Machine Learning |language=en |publisher=PMLR |pages=794–803 |arxiv=1711.02257}}</ref> normalizes gradient vectors during backpropagation.
 
'''Query-Key normalization''' ('''QK-Norm''')<ref>{{Cite journal |last=Henry |first=Alex |last2=Dachapally |first2=Prudhvi Raj |last3=Pawar |first3=Shubham Shantaram |last4=Chen |first4=Yuxuan |date=2020-11 |editor-last=Cohn |editor-first=Trevor |editor2-last=He |editor2-first=Yulan |editor3-last=Liu |editor3-first=Yang |title=Query-Key Normalization for Transformers |url=https://aclanthology.org/2020.findings-emnlp.379/ |journal=Findings of the Association for Computational Linguistics: EMNLP 2020 |___location=Online |publisher=Association for Computational Linguistics |pages=4246–4253 |doi=10.18653/v1/2020.findings-emnlp.379}}</ref> is designed for [[Transformer (deep learning architecture)|Transformers]].
 
== See also ==