Talk:Large language model: Difference between revisions

Content deleted Content added
Cewbot (talk | contribs)
m Maintain {{WPBS}} and vital articles: 5 WikiProject templates. The article is listed in the level 5 page: Artificial intelligence concepts.
Tags: Mobile edit Mobile app edit iOS app edit App talk reply
 
(84 intermediate revisions by 42 users not shown)
Line 7:
}}
{{Daily pageviews|scale=log}}
{{to do}}
 
== Highly biased "impact" ==
There is nothing positive about LLMs in the whole section. How come nothing about their uses statistics etc. it there? <!-- Template:Unsigned IP --><small class="autosigned">—&nbsp;Preceding [[Wikipedia:Signatures|unsigned]] comment added by [[Special:Contributions/2601:184:4081:3050:B522:2FBF:C7C5:960B|2601:184:4081:3050:B522:2FBF:C7C5:960B]] ([[User talk:2601:184:4081:3050:B522:2FBF:C7C5:960B#top|talk]]) 14:15, 5 October 2024 (UTC)</small> <!--Autosigned by SineBot-->
 
:@[[User:2601:184:4081:3050:B522:2FBF:C7C5:960B|2601:184:4081:3050:B522:2FBF:C7C5:960B]] I'm not sure I understand your concerns; this is a largely technical topic. Some readers may be interested in usage statistics, but there are so many different LLMs out there that fairly and accurately representing that in this article may be difficult. Did you have a reliable secondary source in mind? [[User:WeyerStudentOfAgrippa|WeyerStudentOfAgrippa]] ([[User talk:WeyerStudentOfAgrippa|talk]]) 11:22, 6 October 2024 (UTC)
 
== Model list as prose? ==
Line 67 ⟶ 73:
::In that case, I think that the 'hallucination' content should be moved into another section where it can be put into context (it's a one-sentence section, so should I should be merged regardless of the confusion this term causes). For now, I'll just merge it with the previous paragraph about the ability of the LLM to often regurgitate correct facts, though it may be appropriate with the section about emergent properties or applications. Based on the academic work cited in the Hallucination article, it sounds like Hallucination is only an issue when the LLM is used for a specific application (in which case I'd just call it an error, but I guess I'm old fashioned). [[User:AdamChrisR|AdamChrisR]] ([[User talk:AdamChrisR|talk]]) 13:01, 8 April 2023 (UTC)
:::"Hallucination" is [[figurative speech]] used as a [[term of art]]. I don't see how it can be marketing hype. —[[User talk:Alalch E.|Alalch E.]] 21:04, 8 April 2023 (UTC)
:There is a "more technical" term I have heard called 'confabulation' it might be investigating the use of that term. [[Special:Contributions/2603:6080:3CF0:9560:3120:55CD:D2C8:D3A3|2603:6080:3CF0:9560:3120:55CD:D2C8:D3A3]] ([[User talk:2603:6080:3CF0:9560:3120:55CD:D2C8:D3A3|talk]]) 12:36, 28 August 2025 (UTC)
 
== Scaling laws section ==
Line 150 ⟶ 157:
:Agreed, thank you for the feedback. I've gone and reworked much of the introduction, happy to hear any further notes. [[User:StereoFolic|StereoFolic]] ([[User talk:StereoFolic|talk]]) 16:31, 24 July 2023 (UTC)
::Thanks! [[User:D rock naut|D rock naut]] ([[User talk:D rock naut|talk]]) 17:20, 26 July 2023 (UTC)
:Agree that this article is extremely jargon-y and basically unintelligble to the average reader. [[User:Twistybrastrap|Twistybrastrap]] ([[User talk:Twistybrastrap|talk]]) 02:29, 14 May 2025 (UTC)
 
== Training cost in the List ==
Line 174 ⟶ 182:
:I'm not against a separate list, but I think that a list of most influential models is still needed here (bert, gpts, llama, claude, gemini, mistral, etc) [[User:Artem.G|Artem.G]] ([[User talk:Artem.G|talk]]) 10:28, 28 January 2024 (UTC)
::I agree that the foundational models should be talked about here (maybe just in prose) but that the long list would be better as its own article in [[list of large language models]]. [[User:PopoDameron|<span style="color:#305090;padding-right:0.5pt"><i>Popo</i></span><span style="color:black"><b>Dameron</b></span>]] ⁠[[User_talk:PopoDameron|<span style="color:#606060;font-size:11.5px">talk</span>]] 21:07, 29 January 2024 (UTC)
:::I did create a page last month, but deleted it after I realised this existed <span style="background:#F84;padding:3px;border-radius:2px">[[User:Mr vili|<span style="color:#FFF">Mr Vili</span>]] [[User talk:Mr vili|<span style="background:#FFF;padding:0 4px;color:#F84;border-radius: 2px;">talk</span>]]</span> 04:50, 23 February 2024 (UTC)
:*{{Strikethrough |'''Support moving list''' due to article length.}} [[User:WeyerStudentOfAgrippa|WeyerStudentOfAgrippa]] ([[User talk:WeyerStudentOfAgrippa|talk]]) 16:53, 1 February 2024 (UTC)
::'''Reduce to non-table list of notable links.''' [[User:WeyerStudentOfAgrippa|WeyerStudentOfAgrippa]] ([[User talk:WeyerStudentOfAgrippa|talk]]) 17:58, 1 February 2024 (UTC)
Line 180 ⟶ 189:
:*:Nice catch, the article appears to be about 900 words below the 6k readable prose threshold. However, the list in table format feels long and unnecessary here. Several items appear to be sourced to corporate blog posts or preprints. If the list is to remain here, it could be reduced to a non-table list of notable links. [[User:WeyerStudentOfAgrippa|WeyerStudentOfAgrippa]] ([[User talk:WeyerStudentOfAgrippa|talk]]) 17:51, 1 February 2024 (UTC)
:I had actually been thinking about the list table recently. I would have suggested creating a modified, chatbot-listing version of the table at [[List of chatbots]], to which [[Comparison of user features of chatbots]] could also probably be merged. –<span style="box-shadow: 0px 0px 12px red;border-radius:9em;padding:0 2px;background:#D00">[[User:Gluonz|<span style="color:#AFF">'''Gluonz'''</span>]]<sup>''' [[User talk:Gluonz|talk]] [[Special:Contributions/Gluonz|contribs]]'''</sup></span> 17:15, 1 February 2024 (UTC)
:I think we should introduce limits to the list to be limited to only base models, perhaps over a certain parameter size, or merging multiple LLM versions into the same listing.
:This will get excessively long over time, and the prohibitive cost of training '''large''' langauge models should prevent it spiraling out of control <span style="background:#F84;padding:3px;border-radius:2px">[[User:Mr vili|<span style="color:#FFF">Mr Vili</span>]] [[User talk:Mr vili|<span style="background:#FFF;padding:0 4px;color:#F84;border-radius: 2px;">talk</span>]]</span> 04:49, 23 February 2024 (UTC)
 
== Reduce emphasis on non-transformer LLMs? ==
 
The opening paragraph includes the text, "Some recent implementations are based on other architectures, such as recurrent neural network variants and Mamba (a state space model).[2][3][4]". I believe this text should be moved MUCH later in the article, if it is mentioned at all. I don't think the citations included are sufficient to demonstrate the notability of these alternatives to the dominant architecture. Is there agreement on this? --[[User:Yoderj|Yoderj]] ([[User talk:Yoderj|talk]]) 21:15, 21 February 2024 (UTC)
 
:agree, no major LLM is based on alternative architectures, so it's undue to mention ot in the lead. [[User:Artem.G|Artem.G]] ([[User talk:Artem.G|talk]]) 21:34, 21 February 2024 (UTC)
 
== It is weird that this article attributes intelligence to LLMs ==
 
The first sentence of this article claims that LLMs achieve “understanding”. Later it attributes “knowledge” to the model.
 
Using these terms implies consciousness in the models, which is a very strange thing to do. Maybe even misleading.
 
In general, the article fails as an encyclopedia article in that it doesn’t have enough basic information about LLMs to enable a nontechnical reader to obtain a basic understanding. It is a jumble of unexplained jargon peppered through passages where the topic sentences don’t match the rest of the paragraphs. And then it mixes in these implications that the machines are thinking.
 
It could use some help. [[User:Lot49a|<span style="color:blue">Lot</span>]] [[Special:Contributions/Lot49a|<span style="color:orange">'''49a'''</span>]]<sup>[[User talk:Lot49a|<span style="color:blue">talk</span>]]</sup> 12:12, 4 March 2024 (UTC)
:{{ping|Lot49a}} I agree that more basics need explaining, but I don't think the claims of understanding are being made implicitly; they are being stated explicitly with reference to the opinions of scholars. Understanding implies consciousness? Citation needed for that! :) The tests for understanding of language aren't tests of whether something is conscious, so this sounds like your own personal theory of understanding. Two points about adding "citation needed": 1) This needs to be done by [[Help:Template|adding a template]], not just typing "citation needed" into the article; 2) Ideally the lead paragraph of an article should have no citations, because everything stated in that paragraph should be a summary of the full article, where the facts should be cited. You were right to highlight a problem with that sentence because the full article says it's controversial whether LLMs achieve understanding, but the lead sentence said categorically that they do. [[User:MartinPoulter|MartinPoulter]] ([[User talk:MartinPoulter|talk]]) 12:48, 4 March 2024 (UTC)
::Thanks. I appreciate the feedback and also the editor who took some time to improve the introduction. I made an attempt to organize the introduction a little more but since I don't really understand this topic there's not much I can do to help improve it for fear of breaking it. But it really needs work. [[User:Lot49a|<span style="color:blue">Lot</span>]] [[Special:Contributions/Lot49a|<span style="color:orange">'''49a'''</span>]]<sup>[[User talk:Lot49a|<span style="color:blue">talk</span>]]</sup> 23:52, 4 March 2024 (UTC)
 
== Reasoning ==
 
The most important aspects of LLMs are its ability to do limited reasoning and answer specific questions in my understanding. People are also excited about its coding abilities. The intro doesn't say that. Should it be added? Currently the intro says a variety of text generation tasks, which is rather bland and uninteresting. I can dig up references if that will help. Per [[wp:lead]] the lead should contain a summary of the most important content. [[User:Daniel.Cardenas|Daniel.Cardenas]] ([[User talk:Daniel.Cardenas|talk]]) 16:18, 15 March 2024 (UTC)
:Thank you for the offer. More fundamentally, I am beginning to feel that "language" within the term "LLM" is a misnomer. Any Spanish dictionary or Russian grammar book contains "language" but an LLM goes beyond language by achieving '''communication''' and even, as you point out, '''reasoning'''. If I am right (and I am no expert), the summary should mention that LLM is a misnomer (which I think has confused several commentators here). [[Special:Contributions/2A00:23C6:54AD:5701:35A2:4775:8118:3606|2A00:23C6:54AD:5701:35A2:4775:8118:3606]] ([[User talk:2A00:23C6:54AD:5701:35A2:4775:8118:3606|talk]]) 13:31, 28 April 2024 (UTC)
:Reasoning is a recent development caused by smarter prompting practices and later fine-tuning on reasoning structures. It is not an ability of every LLM. On LLMs that have it without fine-tuning, it is due to a more general ability to follow instructions and prompts. Really, the ability to accept verbal prompt or for one to "extend an LLM with just words" is the foundation of almost every cool LLM feature you see.
:And where does the ability to follow a verbal prompt come from? GPT-3 tells us that even without fine-tuning, LLMs just know to do that when predicting words. Because it turns out real human language contain a layer of distribution beyond just grammar called "making sense". Sentences like "colorless green ideas sleep furiously" do not fall into the typical distribution.
:Coding ability on the other hand has been mostly inherent by processing a language as a language. Commercial deployment of coding models predates commercial deployment of reasoning by a lot. [[User:Artoria2e5|Artoria]][[User talk:Artoria2e5|2e5]] <small style="font-weight:lighter">[[Special:Contributions/Artoria2e5|🌉]]</small> 11:47, 25 June 2025 (UTC)
 
== Section "Tool use": Unfinished sentence? ==
 
There seems to be an unfinished sentence (or maybe headline?) here:
"This basic strategy can be sophisticated with multiple attempts of generated programs, and other sampling strategies. '''Cost Savings and Reduced Vendor Dependency'''" [[User:Meltron1|Meltron1]] ([[User talk:Meltron1|talk]]) 16:02, 19 April 2024 (UTC)
 
== Transformer architecture ==
 
Recent edits in the lead section suggest that all LLMs use the transformer architecture. However, this seems to contradict the section "Alternative architecture". [[User:WeyerStudentOfAgrippa|WeyerStudentOfAgrippa]] ([[User talk:WeyerStudentOfAgrippa|talk]]) 22:08, 1 June 2024 (UTC)
 
:well, all "largest and most capable", including all models from the list, are based on transformer. Alt architectures exist, though they are mostly experiments. [[User:Artem.G|Artem.G]] ([[User talk:Artem.G|talk]]) 05:39, 2 June 2024 (UTC)
 
== Privacy ==
 
Example:
User:
"is this conversation sent as data over the internet to external parties?"
Dolphin Phy (latest):
"Yes, the content of your conversation can be considered as data being transmitted over the internet, which can potentially reach external parties through various channels like email, social media platforms, or other online communication tools. It's important to be mindful of privacy and security when sharing personal information or sensitive data online."
 
User:
"is this conversation, that we are having now, subject of being sent over the internet to external parties? "
Dolphin Phy (latest):
"Yes, the content of your conversation is indeed being transmitted over the internet, which could potentially reach external parties through various channels like email, social media platforms, or other online communication tools. It's essential to be aware of privacy and security when sharing personal information or sensitive data in such conversations."
 
User:
"what do external parties use this conversation for?"
Dolphin Phy (latest):
"External parties might use your conversation for various purposes, depending on the context and content. These could include:
 
1. Personal or professional networking: They may be interested in learning more about you or discussing common interests or experiences.
2. Research or analysis: External parties might be gathering information about your opinions, preferences, or experiences to better understand a certain topic or segment of the population.
3. Marketing or advertising: The conversation might be used by businesses for targeted marketing efforts, such as understanding customer preferences or identifying potential customers.
4. Data collection: In some cases, companies or organizations may collect conversations like this for research purposes or to improve their products and services.
5. Legal or regulatory reasons: If the content of your conversation is related to a legal matter or breach of privacy, external parties might use it as evidence in court proceedings." <!-- Template:Unsigned IP --><small class="autosigned">—&nbsp;Preceding [[Wikipedia:Signatures|unsigned]] comment added by [[Special:Contributions/188.27.130.106|188.27.130.106]] ([[User talk:188.27.130.106#top|talk]]) 01:56, 20 September 2024 (UTC)</small> <!--Autosigned by SineBot-->
 
== Corpus size ==
 
Please explain and clarify the [[Units of information|units]]. I read the following (1) tokens, (2) words, and (3) bytes for one and the same parameter or attribute "corpus size". Kr [[User:17387349L8764|17387349L8764]] ([[User talk:17387349L8764|talk]]) 09:10, 1 October 2024 (UTC)
 
== List of LLM models ==
 
Should be exported/separated into a unique article. This lemma should focus on the technicalities of LLM, not an infinitely expanding set of such. Remember to read [[Wikipedia:Stand-alone lists]]. Kr [[User:17387349L8764|17387349L8764]] ([[User talk:17387349L8764|talk]]) 09:29, 1 October 2024 (UTC)
 
:@[[User:17387349L8764|17387349L8764]] This has been discussed before; I agree that the list is long and likely to get longer. [[User:WeyerStudentOfAgrippa|WeyerStudentOfAgrippa]] ([[User talk:WeyerStudentOfAgrippa|talk]]) 01:32, 2 October 2024 (UTC)
 
== Pre-training Section ==
 
It seems that the concept of pre-training is not introduced in this article, while pre-training is very important (if not necessary) in the ___domain of LLMs. So I think a seperate section of pre-training is necessary, or we can put it in the section "Training and architecture".
 
-- [[User:Wonderwind2002|Wonderwind2002]] ([[User talk:Wonderwind2002|talk]]) 03:58, 9 December 2024 (UTC)
 
== Too technical ==
 
I'm flagging this page for being overly technical. WP:TECHNICAL says we should consider our audience when deciding how technical to get. Given how prevalent LLMs are in daily life, I think the average person visiting this page has a high school or college level education, and would find the page in its current state unintelligble. The people who would benefit from highly technical explanations of how LLMs work can easily find that information in other places. [[User:Twistybrastrap|Twistybrastrap]] ([[User talk:Twistybrastrap|talk]]) 02:39, 14 May 2025 (UTC)
: I'm all for making the article less technical. However, I'd like to remind other editors that the focus should ''not'' be to remove facts that are too technical. It should be to explain them in a way that is understandable by most people. --[[User:Mlewan|Mlewan]] ([[User talk:Mlewan|talk]]) 06:19, 14 May 2025 (UTC)
:I would note that "large language model" is a more theoretical topic than chatbots or other implementations. Many readers who use computers would not be interested in every computer science topic. [[User:WeyerStudentOfAgrippa|WeyerStudentOfAgrippa]] ([[User talk:WeyerStudentOfAgrippa|talk]]) 16:10, 15 May 2025 (UTC)
:I think a third paragraph in the lead section could be added to summarize notable implementations like ChatGPT. As long as the information is well-organized, I don't think the LLM article is a high priority for technical re-explanation. Do you have any specific passages to highlight? [[User:WeyerStudentOfAgrippa|WeyerStudentOfAgrippa]] ([[User talk:WeyerStudentOfAgrippa|talk]]) 12:14, 16 May 2025 (UTC)
::Expanding the [[Chatbot]] page might make more sense, then, yeah. Maybe with a link from the LLM page to the Chatbot page, in case people go looking for info about how chatbots work but look up "LLM". [[User:Twistybrastrap|Twistybrastrap]] ([[User talk:Twistybrastrap|talk]]) 13:31, 16 May 2025 (UTC)
 
== Seeking consensus for rephrasing lead ==
 
Following the request above for improving readibility, I had modified the first sentence (version A):
 
A '''large language model''' ('''LLM''') is a [[machine learning]] model designed for [[natural language processing]] tasks, especially [[Natural language generation|language generation]]. LLMs are [[Language model|language models]] with many parameters, and are trained with [[self-supervised learning]] on a vast amount of text.
 
thus (version B)
 
A '''large language model''' ('''LLM''') is a [[Model#Conceptual model|computational model]] designed to mimic the human ability to [[Natural language generation|generate]] and [[natural language processing|process natural language]]. LLMs are [[Language model|language models]] with many parameters, based on [[machine learning]], and are trained with [[self-supervised learning]] on a vast amount of text.
 
Can I ask Wikipedians to comment on, and potentially support, my version (B)?
 
I believe version A is deficient in several respects. First, "machine learning" is not an adjective, and therefore "machine learning model" is grammatically ambiguous and misleading. Does it mean "model of machine learning"? Or does it mean "model using machine learning"? Or does it even mean "model which has been developed after having completed machine learning"? What a confusing way to start a Wikipedia article. Next, version A does not explain the word "model". A model is a representation of something, and is a word used in many disciplines, for example computational models exist in weather forecasting, whereby the computer algorithm represents the real-world weather system. Similarly, here the large language model is a computer algorithm which mimics real-world human language generation and processing. I get the impression that version A is unware of the difference between a model, an algorithm and a computer program. [[Special:Contributions/2A00:23C6:54FE:FB01:8966:12FF:B8AF:BC21|2A00:23C6:54FE:FB01:8966:12FF:B8AF:BC21]] ([[User talk:2A00:23C6:54FE:FB01:8966:12FF:B8AF:BC21|talk]]) 15:55, 5 June 2025 (UTC)
 
:Thanks for explaining your rationale.
:In the noun phrase "machine learning model", "machine learning" is indeed not an adjective, but rather a [[noun adjunct]]. The term is actually [https://scholar.google.com/scholar?hl=en&as_sdt=0%2C5&q=%22machine+learning+model%22&btnG= commonly used in the field], and means a model developed using machine learning. Notice that it is the same construction for "language model", in which "language" is a noun.
:I also don't really like the term "mimic the human ability to", I think the sentence is cleaner and more solidly accurate without that.
:Defining precisely what ML researchers mean when they say "model" is non-trivial and quite philosophical. I believe that the term "model" has a particularly broad meaning in machine learning and can refer to almost any system that makes predictions, clustering or generation, and that it is not limited to systems that simulate something. Anyway, I believe version A can be forgiven for not expliciting the meaning of "model" here, although I agree that the term "computational model" is accurate.
:That being said, even though I like version A, I would be ok with something like this (version C):
:A '''large language model''' ('''LLM''') is a [[computational model]] designed for [[natural language processing]] tasks, especially [[Natural language generation|language generation]]. LLMs are [[machine learning]] models with many parameters, and are trained with [[self-supervised learning]] on a vast amount of text.
:Feedback from other editors is welcome. [[User:Alenoach|Alenoach]] ([[User talk:Alenoach|talk]]) 19:15, 5 June 2025 (UTC)
::Thanks for engaging. Version C is a slight improvement, but you state the problem yourself: "machine learning" needs to be explained, at first mention, and if "model" is nebulous in the computing field, the term should be avoided as far as possible. Thirdly, what is the difference between self-supervised learning and unsupervised learning? These three terms of jargon kill the intelligibility of the lead. So version C needs work as follows:
::A large language model (LLM) is a computational model designed for natural language processing tasks, especially language generation. LLMs are models (models here being systems [computer programs?] that predict, cluster, or generate [predictions/clusters/output reflecting the real world? Or any artefact?] generated by machine learning. These models have many parameters ["many" - more than a million/a trillion?], and are trained [presumably "have been trained"?] with self-supervised [grammatical contradiction - either they train themselves, or they are trained by someone else] learning on a vast amount [define vast - ten times the corpus of Wikipedia?] of text.
::I would be grateful if you could explain the concepts in my square parentheses, and then we can perhaps home in on the final version D. [[Special:Contributions/2A00:23C6:54FE:FB01:E88B:365A:1C6A:D477|2A00:23C6:54FE:FB01:E88B:365A:1C6A:D477]] ([[User talk:2A00:23C6:54FE:FB01:E88B:365A:1C6A:D477|talk]]) 20:43, 5 June 2025 (UTC)
:::I agree that "self-supervised learning" may introduce unnecessary complexity to the lead. My point was not to give a precise definition of the term "model" in machine learning, which I may fail, just to indicate that the usage of the term is somewhat different and broader in the context of machine learning than e.g. in physics. I think it's better to just say that it's a "machine learning model" or a "computational model". For "["many" - more than a million/a trillion?]", I would say "typically more than a billion", but "many" may be acceptable for simplicity. For "[define vast - ten times the corpus of Wikipedia?]", there is also no strict criterion.
:::I would like to know by the way what other editors think is best for the first paragraph. [[User:Alenoach|Alenoach]] ([[User talk:Alenoach|talk]]) 23:40, 6 June 2025 (UTC)
::::Thanks for the billion info, that should definitely be mentioned in the lead for clarity. As for "machine learning model": all three words are unclear. You concede that "model" is unclear to you, so let us put that aside for the moment and see what other editors say. "Machine learning" is grammatically ambiguous: has the LLM been finalised, using machine learning, before the LLM is offered to the general user? Or does the model continue machine learning and continue refining itself while the general user is using it? [[Special:Contributions/2A00:23C6:54FE:FB01:2484:E79:B45A:5434|2A00:23C6:54FE:FB01:2484:E79:B45A:5434]] ([[User talk:2A00:23C6:54FE:FB01:2484:E79:B45A:5434|talk]]) 04:25, 7 June 2025 (UTC)
:::::It's not really that the meaning of "model" is unclear, it's more that it is not easy to rigorously define. But that is the correct term to use. For the training, LLMs are pre-trained with [[self-supervised learning]] and usually then fine-tuned (potentially also regularly fine-tuned after being released). One subtlety is that even though most LLMs are [[generative pretrained transformer|GPTs]], some have a different purpose (e.g. [[BERT (language model)|BERT]] does not generate text, it just encodes text as vectors). [[User:Alenoach|Alenoach]] ([[User talk:Alenoach|talk]]) 18:49, 7 June 2025 (UTC)
::::::It is interesting that you, presumably a computer professional, are unable to explain your own jargon. So this is a major reason why there are so many complaints about this unintelligible Wikipedia article. Can you please ask around some of your older colleagues who may have witnessed the introduction of the term "model" in computing, for their definition what it means? Meanwhile I will look up the literature and get back to you. On your other explanations for self-supervised learning, I am afraid it is unclear again what you are saying because you have an unfortunate habit of using grammatical passive voice, making it unclear whether a human or an algorithm is performing the self-supervised learning and fine-tuning. This use of passive voice is also fatal for the intelligibility of the lead. So please rephrase your explanation - who is doing what? [[Special:Contributions/2A00:23C6:54FE:FB01:C124:A1B:3CA4:1254|2A00:23C6:54FE:FB01:C124:A1B:3CA4:1254]] ([[User talk:2A00:23C6:54FE:FB01:C124:A1B:3CA4:1254|talk]]) 21:18, 7 June 2025 (UTC)
:::::::Please don't overfocus on the lead and the definition of "model". There are other technical sections that are more in need of a rework. I agree that it may be better without the term "self-supervised learning" and that a link from "model" to [[computational model]] may be useful, but a definition of the term "model" in this introduction is out-of-scope. [[User:Alenoach|Alenoach]] ([[User talk:Alenoach|talk]]) 21:43, 7 June 2025 (UTC)
:@[[User:Twistybrastrap|Twistybrastrap]], @[[User:WeyerStudentOfAgrippa|WeyerStudentOfAgrippa]], no obligation, but in case you have an opinion on what needs to be changed to address the complexity issues in the article, or about the discussion above, that would be welcome. [[User:Alenoach|Alenoach]] ([[User talk:Alenoach|talk]]) 21:48, 7 June 2025 (UTC)
::As promised, I am reporting my search results on what a model is. When I ask Google "What is a model in the field of computing?", the AI answer is "In the computer science field, a model refers to a representation of a real-world system, process, or phenomenon that can be used to study, analyze, and predict its behavior. It's essentially a computer program or set of algorithms designed to mimic or simulate aspects of the real world." '''This agrees with what I originally wrote - an LLM is designed to mimic the human ability to produce and process language.''' Furthermore, I logged into the Oxford English Dictionary of 2002, and among the many explanations of usage of the noun "model" since 1570 (starting with the model of a sailing ship), there is not a single entry referring to computers - this implies that someone introduced the term "model" in computer science quite recently, so it should be possible to ask older colleagues what they mean when they use this term, and fix the oft-criticised jargon problem in the Wikipedia article accordingly. Please help, everyone. [[Special:Contributions/2A00:23C6:54FE:FB01:FD18:B33A:C631:63CF|2A00:23C6:54FE:FB01:FD18:B33A:C631:63CF]] ([[User talk:2A00:23C6:54FE:FB01:FD18:B33A:C631:63CF|talk]]) 05:30, 8 June 2025 (UTC)
:::This definition would be good in other scientific fields (e.g. a meteorologic model), but the meaning of model in machine learning is broader. I still believe that your sentence would be cleaner without "... mimic the human ability to ...". I propose to wait for feedback from other editors. [[User:Alenoach|Alenoach]] ([[User talk:Alenoach|talk]]) 15:30, 8 June 2025 (UTC)
:::I am still not sold on "mimic the human ability to". It is not entirely wrong (if language is generated by humans, then predicting how it is made implies some prediction of human behavior) but it remains indirect and unnecessary. "To generate and process", that I agree on.
:::Well actually, "designed to" is very much questionable. In building the baseline LMs, LLM or not, there has been very little thought put into mimicking how humans work besides an abstract view of some "universal recursiveness of human grammar" implying that the model needs to be able to handle recursion, which guides some choice in structuring the neural network: recurrence has been the key theme in RNN, LSTM, and lately transformers. That's all there is.
:::There are even more primitive, pre-neural-network language models! Their scope of application was limited, but they were good enough for the original Google Translate (see: [[statistical machine translation]]), Wolfram Alpha, and Google the search engine. They were analyzing languages. They are even predicting it in the field of data compression<sup>1</sup>. They are definitely not trying to mimic any human ability.
:::<sup>1</sup> Speaking of data compression, the models that work "well enough" there are even further from how humans presumably work. [[LZ77 and LZ78|Lempel–Ziv]] is as primitive as they come but with some good entropy coding it remains very competitive (see: [[Zstd]]). Of course the big [[Hutter Prize]] winners generally use the more complex models via [[context mixing]], but so far the best performer [https://github.com/kaitz/fx2-cmix fx2-cmix] is only using pre-neural-network language modeling tricks like stmming and word type analysis. [[User:Artoria2e5|Artoria]][[User talk:Artoria2e5|2e5]] <small style="font-weight:lighter">[[Special:Contributions/Artoria2e5|🌉]]</small> 11:33, 26 June 2025 (UTC)
::I don't have a strong opinion, but they could all be improved. "Mimic" only appears in the body from one preprint source. A and B link [[language model]], which appears to be larger and more relevant than [[computational model]]. I agree that "is a machine learning model" could be clarified. [[User:WeyerStudentOfAgrippa|WeyerStudentOfAgrippa]] ([[User talk:WeyerStudentOfAgrippa|talk]]) 19:36, 8 June 2025 (UTC)
:::[https://en.wikipedia.org/w/index.php?title=Large_language_model&diff=prev&oldid=1294722783 Bold edit] [[User:WeyerStudentOfAgrippa|WeyerStudentOfAgrippa]] ([[User talk:WeyerStudentOfAgrippa|talk]]) 12:11, 9 June 2025 (UTC)
::::It's pretty good, but I think one key criterion for a language model to be a LLM is having a large number of (machine learning) parameters. There is also some redundancy in saying it's a "language model" and that it's "designed for natural language processing tasks". I may also try some bold edit if I find a good alternative. [[User:Alenoach|Alenoach]] ([[User talk:Alenoach|talk]]) 14:37, 12 June 2025 (UTC)
 
== Proposal: LLM Testing and Evaluation Subsection ==
 
I propose expanding the [[Large language model]] article with a dedicated "Evaluation and Testing" section to document standardized assessment methodologies for LLMs.
 
'''Conflict of Interest Disclosure''':
<nowiki>{{COI disclosure|reason=I work for TestQuality.com, a test management platform|action=discussion only|owner=BitModern, Inc.|declared=User:Joses3es}}</nowiki>
 
=== Proposed content ===
<!-- Template:Unsigned --><small class="autosigned">—&nbsp;Preceding [[Wikipedia:Signatures|unsigned]] comment added by [[User:Joses3es|Joses3es]] ([[User talk:Joses3es#top|talk]] • [[Special:Contributions/Joses3es|contribs]]) 12:02, 9 July 2025 (UTC)</small>
 
== Evaluation and Testing ==
{{cait}}
Systematic evaluation of large language models involves multiple methodologies:
 
==== 1. Benchmark Testing ====
Modern benchmarks employ sophisticated techniques to avoid test-set contamination<ref name="Dodge2021">{{cite arXiv |eprint=2104.08646 |class=cs.CL |first=Jesse |last=Dodge |title=Documenting Large Webtext Corpora |date=2021}}</ref>:
 
* '''Few-shot prompting''' (typically 5-25 examples) tests generalization while minimizing overfitting<ref name="Brown2020">{{cite journal |last=Brown |first=Tom B. |year=2020 |title=Language Models are Few-Shot Learners |journal=Advances in Neural Information Processing Systems |volume=33 |pages=1877–1901}}</ref>
* '''BIG-bench''' framework<ref name="Srivastava2022">{{cite arXiv |eprint=2206.04615 |class=cs.CL |first=Shivani |last=Srivastava |title=Beyond the Imitation Game |date=2022}}</ref> uses standardized JSON templates for 200+ tasks
* '''HELM''' (Holistic Evaluation)<ref name="Liang2022">{{cite arXiv |eprint=2211.09110 |class=cs.CL |first=Percy |last=Liang |title=Holistic Evaluation of Language Models |date=2022}}</ref> tests under multiple conditions (prompt variations, noise levels)
 
'''Key examples''':
 
* '''MMLU'''<ref name="Hendrycks2021">{{cite arXiv |eprint=2009.03300 |class=cs.CL |first=Dan |last=Hendrycks |title=Measuring Massive Multitask Language Understanding |date=2021}}</ref>: GPT-4 scored 86.4% accuracy across 57 subjects
* '''HumanEval'''<ref name="Chen2021">{{cite arXiv |eprint=2107.03374 |class=cs.CL |first=Mark |last=Chen |title=Evaluating Large Language Models Trained on Code |date=2021}}</ref>: Codex solved 28.8% coding problems on first attempt
 
==== 2. Bias and Fairness ====
Industry tools integrate testing into development pipelines:
 
* '''CheckList'''<ref name="Ribeiro2020">{{cite conference |last=Ribeiro |first=Marco Tulio |year=2020 |title=Beyond Accuracy: Behavioral Testing of NLP Models |pages=9592–9608 |book-title=Proceedings of ACL}}</ref> generates adversarial tests (negations, typos)
* '''DeepEval'''<ref>{{cite web |title=DeepEval: Unit Testing for LLMs |url=https://github.com/confident-ai/deepeval |publisher=GitHub}}</ref> provides pytest-compatible assertions
 
==== 3. Safety Testing ====
 
* '''Constitutional AI'''<ref name="Bai2022">{{cite arXiv |eprint=2212.08073 |class=cs.CL |first=Yuntao |last=Bai |title=Constitutional AI: Harmlessness from AI Feedback |date=2022}}</ref> implements rule-based self-critique
* GPT-4 blocked 82% of harmful content during red teaming<ref name="OpenAI2023">{{cite arXiv |eprint=2303.08774 |class=cs.CL |last=OpenAI |title=GPT-4 Technical Report |date=2023}}</ref>
 
=== References ===
{{reflist|2}}
{{caib}}
[[User:Joses3es|Jose F. Amoros]] ([[User talk:Joses3es|talk]]) 12:02, 9 July 2025 (UTC)
 
=== Follow-up ===
To facilitate discussion, I’d specifically appreciate input on:
 
1. '''Structural placement''': Should this be a standalone section or part of "Limitations"?
 
2. '''Content priorities''': Are there any evaluation methods that need more/less emphasis?
 
3. '''Next steps''': If there’s consensus, would an editor be willing to help implement this?
 
Thank you for your time and expertise. —[[User:Joses3es|Joses3es]] ([[User talk:Joses3es|talk]]) 12:12, 9 July 2025 (UTC)
 
:The technical content is concise and clearly referenced. Its weakness is the reliance on [[WP:ARXIV]]. [[User:WeyerStudentOfAgrippa|WeyerStudentOfAgrippa]] ([[User talk:WeyerStudentOfAgrippa|talk]]) 12:37, 9 July 2025 (UTC)
:There is already a section "Evaluation", with a subsection "[[Large language model#Benchmarks|Benchmarks]]". Is it missing important information? [[User:Alenoach|Alenoach]] ([[User talk:Alenoach|talk]]) 21:17, 9 July 2025 (UTC)
:LLM-generated comments (including edit requests) are [[WP:AITALK|not acceptable on Wikipedia]], even on the talk page for the Wikipedia article about large language models. If you would like to participate on Wikipedia, please write in your own words. —&nbsp;'''''[[User:Newslinger|<span style="color:#536267;">Newslinger</span>]]'''&nbsp;<small>[[User talk:Newslinger#top|<span style="color:#708090;">talk</span>]]</small>'' 15:14, 13 July 2025 (UTC)
 
== "Substantial" "Furthermore" and so on. ==
 
This is regarding [https://en.wikipedia.org/w/index.php?title=Large_language_model&diff=prev&oldid=1304775148 this revert].
 
{{ping|AdamJones88}} Please don't just cram a bunch of jargony terms into the article and tell readers that "substantial research" exists. This article already has far too much of that. Another example is the sentence "{{tq|Collectively, these advances have made LLMs more scalable, sustainable, and deployable across diverse settings, from centralized data centers to edge devices.}}" This is vacuous. It sounds vaguely impressive to someone who doesn't think about it too hard, but says nothing of substance at all and is completely unfalsifiable.
 
[[Wikipedia:Large language models|Please do not use LLMs to generate Wikipedia article content]] (or talk page comments, for that matter). If you accidentally write something that includes a bunch of obvious [[WP:AITELLS]], start over and rewrite it to match [[WP:MOS]], [[WP:TONE]], etc. [[User:Grayfell|Grayfell]] ([[User talk:Grayfell|talk]]) 03:00, 8 August 2025 (UTC)