Content deleted Content added
m spelling |
Add "decoder-only" |
||
Line 1:
{{Short description|Type of artificial neural network}}
{{Machine learning|Artificial neural network}}
A '''large language model''' ('''LLM''') is a [[language model]] notable for its ability to achieve general-purpose language generation and understanding. LLMs acquire these abilities by learning statistical relationships from text documents during a computationally intensive [[self-supervised learning|self-supervised]] and [[semi-supervised learning|semi-supervised]] training process.<ref name=":7">{{Cite web |date=2019-02-14 |title=Better Language Models and Their Implications |url=https://openai.com/blog/better-language-models/ |url-status=live |archive-url= https://web.archive.org/web/20201219132206/https://openai.com/blog/better-language-models/ |archive-date=2020-12-19 |access-date=2019-08-25 |website=OpenAI}}</ref> LLMs are [[artificial neural network]]s, the largest and most capable of which are built with a decoder-only [[Transformer (machine learning model)|transformer]]-based architecture. Some recent implementations are based on other architectures, such as [[recurrent neural network]] variants and [[Mamba (deep learning)|Mamba]] (a [[state-space representation|state space]] model).<ref>{{cite arXiv |eprint=2305.13048 |last1=Peng |first1=Bo |last2=Alcaide |first2=Eric |last3=Anthony |first3=Quentin |last4=Albalak |first4=Alon |last5=Arcadinho |first5=Samuel |last6=Biderman |first6=Stella |last7=Cao |first7=Huanqi |last8=Cheng |first8=Xin |last9=Chung |first9=Michael |last10=Grella |first10=Matteo |author11=Kranthi Kiran GV |last12=He |first12=Xuzheng |last13=Hou |first13=Haowen |last14=Lin |first14=Jiaju |last15=Kazienko |first15=Przemyslaw |last16=Kocon |first16=Jan |last17=Kong |first17=Jiaming |last18=Koptyra |first18=Bartlomiej |last19=Lau |first19=Hayden |author20=Krishna Sri Ipsit Mantri |last21=Mom |first21=Ferdinand |last22=Saito |first22=Atsushi |last23=Song |first23=Guangyu |last24=Tang |first24=Xiangru |last25=Wang |first25=Bolun |last26=Wind |first26=Johan S. |last27=Wozniak |first27=Stanislaw |last28=Zhang |first28=Ruichong |last29=Zhang |first29=Zhenyuan |last30=Zhao |first30=Qihang |title=RWKV: Reinventing RNNS for the Transformer Era |date=2023 |class=cs.CL |display-authors=1 }}</ref><ref>{{Cite web |last=Merritt |first=Rick |date=2022-03-25 |title=What Is a Transformer Model? |url=https://blogs.nvidia.com/blog/2022/03/25/what-is-a-transformer-model/ |access-date=2023-07-25 |website=NVIDIA Blog |language=en-US}}</ref><ref>{{Citation |last1=Gu |first1=Albert |title=Mamba: Linear-Time Sequence Modeling with Selective State Spaces |date=2023-12-01 |arxiv=2312.00752 |last2=Dao |first2=Tri}}</ref>
LLMs can be used for text generation, a form of [[Generative artificial intelligence|generative AI]], by taking an input text and repeatedly predicting the next token or word.<ref name="Bowman">{{cite arXiv |eprint=2304.00612 |class=cs.CL |first=Samuel R. |last=Bowman |title=Eight Things to Know about Large Language Models |year=2023}}</ref> Up to 2020, [[Fine-tuning (machine learning)|fine tuning]] was the only way a model could be adapted to be able to accomplish specific tasks. Larger sized models, such as [[GPT-3]], however, can be [[prompt engineering|prompt-engineered]] to achieve similar results.<ref name="few-shot-learners">{{cite journal |last1=Brown |first1=Tom B. |last2=Mann |first2=Benjamin |last3=Ryder |first3=Nick |last4=Subbiah |first4=Melanie |last5=Kaplan |first5=Jared |last6=Dhariwal |first6=Prafulla |last7=Neelakantan |first7=Arvind |last8=Shyam |first8=Pranav |last9=Sastry |first9=Girish |last10=Askell |first10=Amanda |last11=Agarwal |first11=Sandhini |last12=Herbert-Voss |first12=Ariel |last13=Krueger |first13=Gretchen |last14=Henighan |first14=Tom |last15=Child |first15=Rewon |date=Dec 2020 |editor1-last=Larochelle |editor1-first=H. |editor2-last=Ranzato |editor2-first=M. |editor3-last=Hadsell |editor3-first=R. |editor4-last=Balcan |editor4-first=M.F. |editor5-last=Lin |editor5-first=H. |title=Language Models are Few-Shot Learners |url=https://proceedings.neurips.cc/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf |journal=Advances in Neural Information Processing Systems |publisher=Curran Associates, Inc. |volume=33 |pages=1877–1901 |last25=Chess |last20=Hesse |first20=Christopher |last21=Chen |first21=Mark |last22=Sigler |first22=Eric |last23=Litwin |first23=Mateusz |last24=Gray |first24=Scott |first26=Jack |first25=Benjamin |last26=Clark |last19=Winter |last27=Berner |first27=Christopher |last28=McCandlish |first28=Sam |last29=Radford |first29=Alec |last30=Sutskever |first30=Ilya |last31=Amodei |first31=Dario |first19=Clemens |first18=Jeffrey |last18=Wu |last16=Ramesh |first16=Aditya |last17=Ziegler |first17=Daniel M.}}</ref> They are thought to acquire knowledge about syntax, semantics and "ontology" inherent in human language corpora, but also inaccuracies and [[Algorithmic bias|biases]] present in the corpora.<ref name="Manning-2022">{{cite journal |last=Manning |first=Christopher D. |author-link=Christopher D. Manning |year=2022 |title=Human Language Understanding & Reasoning |url=https://www.amacad.org/publication/human-language-understanding-reasoning |journal=Daedalus |volume=151 |issue=2 |pages=127–138 |doi=10.1162/daed_a_01905 |s2cid=248377870|doi-access=free }}</ref>
|