Content deleted Content added
→Miscellaneous: QK norm |
Ira Leviton (talk | contribs) m Fixed a reference. Please see Category:CS1 errors: dates. |
||
Line 214:
'''Gradient normalization''' ('''GradNorm''')<ref>{{Cite journal |last1=Chen |first1=Zhao |last2=Badrinarayanan |first2=Vijay |last3=Lee |first3=Chen-Yu |last4=Rabinovich |first4=Andrew |date=2018-07-03 |title=GradNorm: Gradient Normalization for Adaptive Loss Balancing in Deep Multitask Networks |url=https://proceedings.mlr.press/v80/chen18a.html |journal=Proceedings of the 35th International Conference on Machine Learning |language=en |publisher=PMLR |pages=794–803 |arxiv=1711.02257}}</ref> normalizes gradient vectors during backpropagation.
'''Query-Key normalization''' ('''QK-Norm''')<ref>{{Cite journal |last=Henry |first=Alex |last2=Dachapally |first2=Prudhvi Raj |last3=Pawar |first3=Shubham Shantaram |last4=Chen |first4=Yuxuan |date=November 2020
== See also ==
|