Content deleted Content added
Expand a bit on history section |
Fixed reference date error(s) (see CS1 errors: dates for details) and AWB general fixes |
||
Line 2:
{{Use dmy dates|date=July 2022}}
A '''language model''' is a [[Model#Conceptual model|model]] of natural language.<ref>{{cite book |last1=Jurafsky |first1=Dan |last2=Martin |first2=James H. |title=Speech and Language Processing |date=2021 |edition=3rd |url=https://web.stanford.edu/~jurafsky/slp3/ |access-date=24 May 2022 |chapter=N-gram Language Models |archive-date=22 May 2022 |archive-url=https://web.archive.org/web/20220522005855/https://web.stanford.edu/~jurafsky/slp3/ |url-status=live }}</ref> Language models are useful for a variety of tasks, including [[speech recognition]],<ref>Kuhn, Roland, and Renato De Mori (1990). [https://www.researchgate.net/profile/Roland_Kuhn2/publication/3191800_Cache-based_natural_language_model_for_speech_recognition/links/004635184ee5b2c24f000000.pdf "A cache-based natural language model for speech recognition"]. ''IEEE transactions on pattern analysis and machine intelligence'' 12.6: 570–583.</ref>
[[Large language model]]s, currently their most advanced form, are a combination of larger datasets (frequently using words [[Web scraping|scraped]] from the public [[internet]]), [[feedforward neural network]]s, and [[transformer (machine learning)|transformer]]s. They have superseded [[recurrent neural network]]-based models, which had previously superseded the pure statistical models, such as [[Word n-gram language model|word ''n''-gram language model]].
== History ==
[[Noam Chomsky]] did pioneering work on language models in the 1950s by developing a theory of [[
In 1980, statistical approaches were explored and found to be more useful for many purposes than rule-based formal grammars. Discrete representations like [[Word n-gram language model|word ''n''-gram language models]], with probabilities for discrete combinations of words, made significant advances.
Line 39:
{{excerpt|Large language model}}
Although sometimes matching human performance, it is not clear whether they are plausible [[
== Evaluation and benchmarks ==
|