Revision as of 19:40, 10 October 2024 edit Cosmia Nebula (talk \| contribs) Extended confirmed users 11,304 edits →Adaptive Tag: Visual edit ← Previous edit		Revision as of 19:43, 10 October 2024 edit undo Cosmia Nebula (talk \| contribs) Extended confirmed users 11,304 edits →Adaptive Tag: Visual edit Next edit →
Line 155: === Adaptive === '''Adaptive layer norm''' ('''adaLN''') computes the <math>\gamma, \beta</math> in a LayerNorm not from the layer activation itself, but from other data. It was first proposed for CNN,<ref>{{Cite journal \|last=Perez \|first=Ethan \|last2=Strub \|first2=Florian \|last3=De Vries \|first3=Harm \|last4=Dumoulin \|first4=Vincent \|last5=Courville \|first5=Aaron \|date=2018-04-29 \|title=FiLM: Visual Reasoning with a General Conditioning Layer \|url=https://ojs.aaai.org/index.php/AAAI/article/view/11671 \|journal=Proceedings of the AAAI Conference on Artificial Intelligence \|volume=32 \|issue=1 \|doi=10.1609/aaai.v32i1.11671 \|issn=2374-3468}}</ref> and has been used effectively in [[diffusion Transformer]] (DiT).<ref>{{Cite journal \|last1=Peebles \|first1=William \|last2=Xie \|first2=Saining \|date=2023 \|title=Scalable Diffusion Models with Transformers \|url=https://openaccess.thecvf.com/content/ICCV2023/html/Peebles_Scalable_Diffusion_Models_with_Transformers_ICCV_2023_paper.html \|language=en \|pages=4195–4205 \|arxiv=2212.09748}}</ref> For example, in DiT, the conditioning information (such as text encoding vector) is processed by an MLP into <math>\gamma, \beta</math>, which is then applied in the LayerNorm module in a Transformer. == Weight normalization ==

Normalization (machine learning): Difference between revisions