Revision as of 18:08, 18 February 2024 edit Jellysandwich0 (talk \| contribs) Extended confirmed users 51,643 edits m spelling ← Previous edit		Revision as of 21:09, 21 February 2024 edit undo Yoderj (talk \| contribs) 467 edits Add "decoder-only" Next edit →
Line 1: {{Short description\|Type of artificial neural network}} {{Machine learning\|Artificial neural network}} A '''large language model''' ('''LLM''') is a [[language model]] notable for its ability to achieve general-purpose language generation and understanding. LLMs acquire these abilities by learning statistical relationships from text documents during a computationally intensive [[self-supervised learning\|self-supervised]] and [[semi-supervised learning\|semi-supervised]] training process.<ref name=":7">{{Cite web \|date=2019-02-14 \|title=Better Language Models and Their Implications \|url=https://openai.com/blog/better-language-models/ \|url-status=live \|archive-url= https://web.archive.org/web/20201219132206/https://openai.com/blog/better-language-models/ \|archive-date=2020-12-19 \|access-date=2019-08-25 \|website=OpenAI}}</ref> LLMs are [[artificial neural network]]s, the largest and most capable of which are built with a decoder-only [[Transformer (machine learning model)\|transformer]]-based architecture. Some recent implementations are based on other architectures, such as [[recurrent neural network]] variants and [[Mamba (deep learning)\|Mamba]] (a [[state-space representation\|state space]] model).<ref>{{cite arXiv \|eprint=2305.13048 \|last1=Peng \|first1=Bo \|last2=Alcaide \|first2=Eric \|last3=Anthony \|first3=Quentin \|last4=Albalak \|first4=Alon \|last5=Arcadinho \|first5=Samuel \|last6=Biderman \|first6=Stella \|last7=Cao \|first7=Huanqi \|last8=Cheng \|first8=Xin \|last9=Chung \|first9=Michael \|last10=Grella \|first10=Matteo \|author11=Kranthi Kiran GV \|last12=He \|first12=Xuzheng \|last13=Hou \|first13=Haowen \|last14=Lin \|first14=Jiaju \|last15=Kazienko \|first15=Przemyslaw \|last16=Kocon \|first16=Jan \|last17=Kong \|first17=Jiaming \|last18=Koptyra \|first18=Bartlomiej \|last19=Lau \|first19=Hayden \|author20=Krishna Sri Ipsit Mantri \|last21=Mom \|first21=Ferdinand \|last22=Saito \|first22=Atsushi \|last23=Song \|first23=Guangyu \|last24=Tang \|first24=Xiangru \|last25=Wang \|first25=Bolun \|last26=Wind \|first26=Johan S. \|last27=Wozniak \|first27=Stanislaw \|last28=Zhang \|first28=Ruichong \|last29=Zhang \|first29=Zhenyuan \|last30=Zhao \|first30=Qihang \|title=RWKV: Reinventing RNNS for the Transformer Era \|date=2023 \|class=cs.CL \|display-authors=1 }}</ref><ref>{{Cite web \|last=Merritt \|first=Rick \|date=2022-03-25 \|title=What Is a Transformer Model? \|url=https://blogs.nvidia.com/blog/2022/03/25/what-is-a-transformer-model/ \|access-date=2023-07-25 \|website=NVIDIA Blog \|language=en-US}}</ref><ref>{{Citation \|last1=Gu \|first1=Albert \|title=Mamba: Linear-Time Sequence Modeling with Selective State Spaces \|date=2023-12-01 \|arxiv=2312.00752 \|last2=Dao \|first2=Tri}}</ref> LLMs can be used for text generation, a form of [[Generative artificial intelligence\|generative AI]], by taking an input text and repeatedly predicting the next token or word.<ref name="Bowman">{{cite arXiv \|eprint=2304.00612 \|class=cs.CL \|first=Samuel R. \|last=Bowman \|title=Eight Things to Know about Large Language Models \|year=2023}}</ref> Up to 2020, [[Fine-tuning (machine learning)\|fine tuning]] was the only way a model could be adapted to be able to accomplish specific tasks. Larger sized models, such as [[GPT-3]], however, can be [[prompt engineering\|prompt-engineered]] to achieve similar results.<ref name="few-shot-learners">{{cite journal \|last1=Brown \|first1=Tom B. \|last2=Mann \|first2=Benjamin \|last3=Ryder \|first3=Nick \|last4=Subbiah \|first4=Melanie \|last5=Kaplan \|first5=Jared \|last6=Dhariwal \|first6=Prafulla \|last7=Neelakantan \|first7=Arvind \|last8=Shyam \|first8=Pranav \|last9=Sastry \|first9=Girish \|last10=Askell \|first10=Amanda \|last11=Agarwal \|first11=Sandhini \|last12=Herbert-Voss \|first12=Ariel \|last13=Krueger \|first13=Gretchen \|last14=Henighan \|first14=Tom \|last15=Child \|first15=Rewon \|date=Dec 2020 \|editor1-last=Larochelle \|editor1-first=H. \|editor2-last=Ranzato \|editor2-first=M. \|editor3-last=Hadsell \|editor3-first=R. \|editor4-last=Balcan \|editor4-first=M.F. \|editor5-last=Lin \|editor5-first=H. \|title=Language Models are Few-Shot Learners \|url=https://proceedings.neurips.cc/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf \|journal=Advances in Neural Information Processing Systems \|publisher=Curran Associates, Inc. \|volume=33 \|pages=1877–1901 \|last25=Chess \|last20=Hesse \|first20=Christopher \|last21=Chen \|first21=Mark \|last22=Sigler \|first22=Eric \|last23=Litwin \|first23=Mateusz \|last24=Gray \|first24=Scott \|first26=Jack \|first25=Benjamin \|last26=Clark \|last19=Winter \|last27=Berner \|first27=Christopher \|last28=McCandlish \|first28=Sam \|last29=Radford \|first29=Alec \|last30=Sutskever \|first30=Ilya \|last31=Amodei \|first31=Dario \|first19=Clemens \|first18=Jeffrey \|last18=Wu \|last16=Ramesh \|first16=Aditya \|last17=Ziegler \|first17=Daniel M.}}</ref> They are thought to acquire knowledge about syntax, semantics and "ontology" inherent in human language corpora, but also inaccuracies and [[Algorithmic bias\|biases]] present in the corpora.<ref name="Manning-2022">{{cite journal \|last=Manning \|first=Christopher D. \|author-link=Christopher D. Manning \|year=2022 \|title=Human Language Understanding & Reasoning \|url=https://www.amacad.org/publication/human-language-understanding-reasoning \|journal=Daedalus \|volume=151 \|issue=2 \|pages=127–138 \|doi=10.1162/daed_a_01905 \|s2cid=248377870\|doi-access=free }}</ref>

Large language model: Difference between revisions