Content deleted Content added
→Recurrent neural network: as suggested in the talk page, take a shot at including earlier non-statistical models. |
Citation bot (talk | contribs) Removed URL that duplicated identifier. Removed parameters. | Use this bot. Report bugs. | Suggested by Headbomb | Linked from Wikipedia:WikiProject_Academic_Journals/Journals_cited_by_Wikipedia/Sandbox | #UCB_webform_linked 506/990 |
||
(21 intermediate revisions by 17 users not shown) | |||
Line 2:
{{Use dmy dates|date=July 2022}}
A '''language model''' is a [[Model#Conceptual model|model]] of the human brain's ability to produce [[natural language]].<ref>{{cite journal |last1=Blank |first1=Idan A. |title=What are large language models supposed to model? |journal=Trends in Cognitive Sciences |date=November 2023 |volume=27 |issue=11 |pages=987–989 |doi=10.1016/j.tics.2023.08.006|pmid=37659920 |doi-access=free }}"LLMs are supposed to model how utterances behave." </ref><ref>{{cite book |last1=Jurafsky |first1=Dan |last2=Martin |first2=James H. |title=Speech and Language Processing |date=2021 |edition=3rd |url=https://web.stanford.edu/~jurafsky/slp3/ |access-date=24 May 2022 |chapter=N-gram Language Models |chapter-url= https://web.stanford.edu/~jurafsky/slp3/3.pdf |archive-date=22 May 2022 |archive-url=https://web.archive.org/web/20220522005855/https://web.stanford.edu/~jurafsky/slp3/ |url-status=live }}</ref> Language models are useful for a variety of tasks, including [[speech recognition]],<ref>Kuhn, Roland, and Renato De Mori (1990). [https://www.researchgate.net/profile/Roland_Kuhn2/publication/3191800_Cache-based_natural_language_model_for_speech_recognition/links/004635184ee5b2c24f000000.pdf "A cache-based natural language model for speech recognition"]. ''IEEE transactions on pattern analysis and machine intelligence'' 12.6: 570–583.</ref>
[[Large language model]]s (LLMs), currently their most advanced form, are
== History ==
[[Noam Chomsky]] did
In the 2000s, continuous representations for words, such as [[Word2vec|word embeddings]], began to replace discrete representations.<ref>{{Cite news |date=2022-02-22 |title=The Nature Of Life, The Nature Of Thinking: Looking Back On Eugene
== Pure statistical models ==
Line 34:
== Neural models ==
=== Recurrent neural network ===
Continuous representations or [[Word embedding|embeddings of words]] are produced in [[recurrent neural network]]-based language models (known also as ''continuous space language models'').<ref>{{cite web |last1=Karpathy |first1=Andrej |title=The Unreasonable Effectiveness of Recurrent Neural Networks |url=https://karpathy.github.io/2015/05/21/rnn-effectiveness/ |access-date=27 January 2019 |archive-date=1 November 2020 |archive-url=https://web.archive.org/web/20201101215448/http://karpathy.github.io/2015/05/21/rnn-effectiveness/ |url-status=live }}</ref> Such continuous space embeddings help to alleviate the [[curse of dimensionality]], which is the consequence of the number of possible sequences of words increasing [[Exponential growth|exponentially]] with the size of the vocabulary,
=== Large language models ===
{{excerpt|Large language model}}
Although sometimes matching human performance, it is not clear whether they are plausible [[
== Evaluation and benchmarks ==
Line 62:
== See also ==
{{portal |Linguistics |Mathematics |Technology}}
{{div col|colwidth=
* {{Annotated link|Artificial intelligence and elections}}
* [[Cache language model]]
Line 101 ⟶ 102:
{{refbegin}}
* {{cite conference |author1=Jay M. Ponte |author2=W. Bruce Croft | citeseerx=10.1.1.117.4237 |doi=10.1145/290941.291008 |doi-access=free
* {{cite conference |author1=Fei Song |author2=W. Bruce Croft | citeseerx=10.1.1.21.6467 |doi=10.1145/319950.320022 |doi-access=free
* {{cite tech report |first=Stanley F. |last=Chen |author2=Joshua Goodman |title=An Empirical Study of Smoothing Techniques for Language Modeling |institution=Harvard University |year=1998 |citeseerx=10.1.1.131.5458 |url=https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=273adbdb43097636aa9260d9ecd60d0787b0ef4d }}
|