Revision as of 19:01, 31 January 2024 edit Yoderj (talk \| contribs) 467 edits →History: Again emphasize the strong link between LLMs and transformers by moving "transformer" a little earlier in the history discussion. ← Previous edit		Revision as of 19:42, 31 January 2024 edit undo Gluonz (talk \| contribs) Extended confirmed users 3,575 edits Microsoft Copilot is a chatbot primarily powered by the GPT-4 language model, not a large language model in itself Tag: 2017 wikitext editor Next edit →
Line 5: LLMs can be used for text generation, a form of [[Generative artificial intelligence\|generative AI]], by taking an input text and repeatedly predicting the next token or word.<ref name="Bowman">{{cite arXiv \|eprint=2304.00612 \|class=cs.CL \|first=Samuel R. \|last=Bowman \|title=Eight Things to Know about Large Language Models \|year=2023}}</ref> Up to 2020, [[Fine-tuning (machine learning)\|fine tuning]] was the only way a model could be adapted to be able to accomplish specific tasks. Larger sized models, such as [[GPT-3]], however, can be [[prompt engineering\|prompt-engineered]] to achieve similar results.<ref name="few-shot-learners">{{cite journal \|last1=Brown \|first1=Tom B. \|last2=Mann \|first2=Benjamin \|last3=Ryder \|first3=Nick \|last4=Subbiah \|first4=Melanie \|last5=Kaplan \|first5=Jared \|last6=Dhariwal \|first6=Prafulla \|last7=Neelakantan \|first7=Arvind \|last8=Shyam \|first8=Pranav \|last9=Sastry \|first9=Girish \|last10=Askell \|first10=Amanda \|last11=Agarwal \|first11=Sandhini \|last12=Herbert-Voss \|first12=Ariel \|last13=Krueger \|first13=Gretchen \|last14=Henighan \|first14=Tom \|last15=Child \|first15=Rewon \|date=Dec 2020 \|editor1-last=Larochelle \|editor1-first=H. \|editor2-last=Ranzato \|editor2-first=M. \|editor3-last=Hadsell \|editor3-first=R. \|editor4-last=Balcan \|editor4-first=M.F. \|editor5-last=Lin \|editor5-first=H. \|title=Language Models are Few-Shot Learners \|url=https://proceedings.neurips.cc/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf \|journal=Advances in Neural Information Processing Systems \|publisher=Curran Associates, Inc. \|volume=33 \|pages=1877–1901 \|last25=Chess \|last20=Hesse \|first20=Christopher \|last21=Chen \|first21=Mark \|last22=Sigler \|first22=Eric \|last23=Litwin \|first23=Mateusz \|last24=Gray \|first24=Scott \|first26=Jack \|first25=Benjamin \|last26=Clark \|last19=Winter \|last27=Berner \|first27=Christopher \|last28=McCandlish \|first28=Sam \|last29=Radford \|first29=Alec \|last30=Sutskever \|first30=Ilya \|last31=Amodei \|first31=Dario \|first19=Clemens \|first18=Jeffrey \|last18=Wu \|last16=Ramesh \|first16=Aditya \|last17=Ziegler \|first17=Daniel M.}}</ref> They are thought to acquire knowledge about syntax, semantics and "ontology" inherent in human language corpora, but also inaccuracies and [[Algorithmic bias\|biases]] present in the corpora.<ref name="Manning-2022">{{cite journal \|last=Manning \|first=Christopher D. \|author-link=Christopher D. Manning \|year=2022 \|title=Human Language Understanding & Reasoning \|url=https://www.amacad.org/publication/human-language-understanding-reasoning \|journal=Daedalus \|volume=151 \|issue=2 \|pages=127–138 \|doi=10.1162/daed_a_01905 \|s2cid=248377870\|doi-access=free }}</ref> Some notable LLMs are [[OpenAI]]'s [[Generative pre-trained transformer\|GPT]] series of models (e.g., [[GPT-3.5]] and [[GPT-4]], used in [[ChatGPT]] and [[Microsoft Copilot]]), [[Google]]'s [[PaLM]] and [[Gemini (language model)\|Gemini]] (used in [[Google Bard\|Bard]])~~, [[Microsoft Copilot\|Microsoft's Copilot]]~~, [[Meta Platforms\|Meta]]'s [[LLaMA]] family of open-source models, and [[Anthropic]]'s [[Claude (language model)\|Claude]] models. ==History==

Large language model: Difference between revisions