[[File:The-Transformer-model-architecture.png|thumb|upright=1.3|An illustration of main components of the transformer model from the original paper, where layers were normalized after (instead of before) multiheaded attention.]]
At the 2017 [[NeurIPS]] conference, Google researchers presentedintroduced the [[transformer architecture]] in their landmark paper "[[Attention Is All You Need]]",. which,This with thepaper's goal ofwas improvingto improve upon 2014 [[Seq2seq]] technology, introduced the [[transformer architecture]],<ref>{{cite journal |last1=Vaswani |first1=Ashish |author1-link= Ashish Vaswani |last2=Shazeer |first2=Noam |last3=Parmar |first3=Niki |last4=Uszkoreit |first4=Jakob |last5=Jones |first5=Llion |last6=Gomez |first6=Aidan N |author6-link= Aidan Gomez |last7=Kaiser |first7=Łukasz |last8=Polosukhin |first8=Illia |title=Attention is All you Need |journal=Advances in Neural Information Processing Systems |date=2017 |volume=30 |url=https://proceedings.neurips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf |publisher=Curran Associates, Inc.}}</ref> and was based mainly on the [[attention (machine learning)|attention]] mechanism developed by Bahdanau et. al. in 2014.<ref>{{cite arxiv |last1=Bahdanau |first1=Dzmitry |last2=Cho |first2=Kyunghyun |last3=Bengio |first3=Yoshua |title=Neural Machine Translation by Jointly Learning to Align and Translate |date=2014 |arxiv=1409.0473}}</ref> The following year in 2018, [[BERT (language model)|BERT]] was introduced and quickly became "ubiquitous".<ref>{{Cite journal|last1=Rogers|first1=Anna|last2=Kovaleva|first2=Olga|last3=Rumshisky|first3=Anna|date=2020|title=A Primer in BERTology: What We Know About How BERT Works|url=https://aclanthology.org/2020.tacl-1.54|journal=Transactions of the Association for Computational Linguistics|volume=8|pages=842–866|doi=10.1162/tacl_a_00349|arxiv=2002.12327|s2cid=211532403}}</ref> Though the original transformer has both encoder and decoder blocks, BERT is an encoder-only model.
Although decoder-only [[GPT-1]] was introduced in 2018, it was [[GPT-2]] in 2019 that caught widespread attention because [[OpenAI]] at first deemed it too powerful to release publicly, out of fear of malicious use.<ref>{{cite web |url=https://www.theguardian.com/technology/2019/feb/14/elon-musk-backed-ai-writes-convincing-news-fiction |title=New AI fake text generator may be too dangerous to release, say creators |last=Hern |first=Alex |work=[[The Guardian]] |date=14 February 2019