Revision as of 18:57, 31 January 2024 edit Yoderj (talk \| contribs) 467 edits Clarify that the largest LLMs are transformers. ← Previous edit		Revision as of 19:01, 31 January 2024 edit undo Yoderj (talk \| contribs) 467 edits →History: Again emphasize the strong link between LLMs and transformers by moving "transformer" a little earlier in the history discussion. Next edit →
Line 9: ==History== [[File:The-Transformer-model-architecture.png\|thumb\|upright=1.3\|An illustration of main components of the transformer model from the original paper, where layers were normalized after (instead of before) multiheaded attention.]] At the 2017 [[NeurIPS]] conference, Google researchers ~~presented~~introduced the [[transformer architecture]] in their landmark paper "[[Attention Is All You Need]]",. ~~which,~~This ~~with the~~paper's goal ofwas ~~improving~~to improve upon 2014 [[Seq2seq]] technology, ~~introduced the [[transformer architecture]],~~<ref>{{cite journal \|last1=Vaswani \|first1=Ashish \|author1-link= Ashish Vaswani \|last2=Shazeer \|first2=Noam \|last3=Parmar \|first3=Niki \|last4=Uszkoreit \|first4=Jakob \|last5=Jones \|first5=Llion \|last6=Gomez \|first6=Aidan N \|author6-link= Aidan Gomez \|last7=Kaiser \|first7=Łukasz \|last8=Polosukhin \|first8=Illia \|title=Attention is All you Need \|journal=Advances in Neural Information Processing Systems \|date=2017 \|volume=30 \|url=https://proceedings.neurips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf \|publisher=Curran Associates, Inc.}}</ref> and was based mainly on the [[attention (machine learning)\|attention]] mechanism developed by Bahdanau et. al. in 2014.<ref>{{cite arxiv \|last1=Bahdanau \|first1=Dzmitry \|last2=Cho \|first2=Kyunghyun \|last3=Bengio \|first3=Yoshua \|title=Neural Machine Translation by Jointly Learning to Align and Translate \|date=2014 \|arxiv=1409.0473}}</ref> The following year in 2018, [[BERT (language model)\|BERT]] was introduced and quickly became "ubiquitous".<ref>{{Cite journal\|last1=Rogers\|first1=Anna\|last2=Kovaleva\|first2=Olga\|last3=Rumshisky\|first3=Anna\|date=2020\|title=A Primer in BERTology: What We Know About How BERT Works\|url=https://aclanthology.org/2020.tacl-1.54\|journal=Transactions of the Association for Computational Linguistics\|volume=8\|pages=842–866\|doi=10.1162/tacl_a_00349\|arxiv=2002.12327\|s2cid=211532403}}</ref> Though the original transformer has both encoder and decoder blocks, BERT is an encoder-only model. Although decoder-only [[GPT-1]] was introduced in 2018, it was [[GPT-2]] in 2019 that caught widespread attention because [[OpenAI]] at first deemed it too powerful to release publicly, out of fear of malicious use.<ref>{{cite web \|url=https://www.theguardian.com/technology/2019/feb/14/elon-musk-backed-ai-writes-convincing-news-fiction \|title=New AI fake text generator may be too dangerous to release, say creators \|last=Hern \|first=Alex \|work=[[The Guardian]] \|date=14 February 2019

Large language model: Difference between revisions