Revision as of 16:27, 31 January 2025 edit The Hurdy Gurdy Man (talk \| contribs) 287 edits m →Further reading: Added alternative links (DOIs) to the currently broken citeseerx links. ← Previous edit		Revision as of 18:03, 5 February 2025 edit undo Nealmcb (talk \| contribs) Extended confirmed users, Pending changes reviewers 7,194 edits →Recurrent neural network: as suggested in the talk page, take a shot at including earlier non-statistical models. Tag: Visual edit Next edit →
Line 2: {{Use dmy dates\|date=July 2022}} A '''language model''' is a [[Model#Conceptual model\|model]] of natural language.<ref>{{cite book \|last1=Jurafsky \|first1=Dan \|last2=Martin \|first2=James H. \|title=Speech and Language Processing \|date=2021 \|edition=3rd \|url=https://web.stanford.edu/~jurafsky/slp3/ \|access-date=24 May 2022 \|chapter=N-gram Language Models \|archive-date=22 May 2022 \|archive-url=https://web.archive.org/web/20220522005855/https://web.stanford.edu/~jurafsky/slp3/ \|url-status=live }}</ref> Language models are useful for a variety of tasks, including [[speech recognition]]<ref>Kuhn, Roland, and Renato De Mori (1990). [https://www.researchgate.net/profile/Roland_Kuhn2/publication/3191800_Cache-based_natural_language_model_for_speech_recognition/links/004635184ee5b2c24f000000.pdf "A cache-based natural language model for speech recognition"]. ''IEEE transactions on pattern analysis and machine intelligence'' 12.6: 570–583.</ref> ~~(helping prevent predictions of low-probability (e.g. nonsense) sequences)~~, [[machine translation]],<ref name="Semantic parsing as machine translation">Andreas, Jacob, Andreas Vlachos, and Stephen Clark (2013). [https://www.aclweb.org/anthology/P13-2009 "Semantic parsing as machine translation"] {{Webarchive\|url=https://web.archive.org/web/20200815080932/https://www.aclweb.org/anthology/P13-2009/ \|date=15 August 2020 }}. Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers).</ref> [[natural language generation]] (generating more human-like text), [[optical character recognition]], [[route optimization]],<ref>{{cite journal \|last1=Liu \|first1=Yang \|last2=Wu \|first2=Fanyou \|last3=Liu \|first3=Zhiyuan \|last4=Wang \|first4=Kai \|last5=Wang \|first5=Feiyue \|last6=Qu \|first6=Xiaobo \|title=Can language models be used for real-world urban-delivery route optimization? \|journal=The Innovation \|date=2023 \|volume=4 \|issue=6 \|pages=100520 \|doi=10.1016/j.xinn.2023.100520 \|doi-access=free\|pmid=37869471 \|pmc=10587631 \|bibcode=2023Innov...400520L }}</ref> [[handwriting recognition]],<ref>Pham, Vu, et al (2014). [https://arxiv.org/abs/1312.4569 "Dropout improves recurrent neural networks for handwriting recognition"] {{Webarchive\|url=https://web.archive.org/web/20201111170554/https://arxiv.org/abs/1312.4569 \|date=11 November 2020 }}. 14th International Conference on Frontiers in Handwriting Recognition. IEEE.</ref> [[grammar induction]],<ref>Htut, Phu Mon, Kyunghyun Cho, and Samuel R. Bowman (2018). [https://arxiv.org/pdf/1808.10000.pdf?source=post_page--------------------------- "Grammar induction with neural language models: An unusual replication"] {{Webarchive\|url=https://web.archive.org/web/20220814010528/https://arxiv.org/pdf/1808.10000.pdf?source=post_page--------------------------- \|date=14 August 2022 }}. {{arXiv\|1808.10000}}.</ref> and [[information retrieval]].<ref name="ponte1998">{{cite conference \|first1=Jay M. \|last1=Ponte \|first2= W. Bruce \|last2=Croft \| title= A language modeling approach to information retrieval \|conference=Proceedings of the 21st ACM SIGIR Conference \|year=1998 \|publisher=ACM \|place=Melbourne, Australia \| pages = 275–281\| doi=10.1145/290941.291008}}</ref><ref name="hiemstra1998">{{cite conference \| first=Djoerd \| last=Hiemstra \| year = 1998 \| title = A linguistically motivated probabilistically model of information retrieval \| conference = Proceedings of the 2nd European conference on Research and Advanced Technology for Digital Libraries \| publisher = LNCS, Springer \| pages=569–584 \| doi= 10.1007/3-540-49653-X_34}}</ref>▼ A '''language model''' is a probabilistic [[Model#Conceptual model\|model]] of a natural language.<ref>{{cite book \|last1=Jurafsky \|first1=Dan \|last2=Martin \|first2=James H. \|title=Speech and Language Processing \|date=2021 \|edition=3rd \|url=https://web.stanford.edu/~jurafsky/slp3/ \|access-date=24 May 2022 \|chapter=N-gram Language Models \|archive-date=22 May 2022 \|archive-url=https://web.archive.org/web/20220522005855/https://web.stanford.edu/~jurafsky/slp3/ \|url-status=live }}</ref> In 1980, the first significant statistical language model was proposed, and during the decade IBM performed ‘[[Claude Shannon\|Shannon]]-style’ experiments, in which potential sources for language modeling improvement were identified by observing and analyzing the performance of human subjects in predicting or correcting text.<ref>{{cite journal \|last1=Rosenfeld \|first1=Ronald \|year=2000 \|title=Two decades of statistical language modeling: Where do we go from here? \|journal=Proceedings of the IEEE \|volume=88 \|issue=8\|pages=1270–1278 \|doi=10.1109/5.880083 \|s2cid=10959945 \|url=https://figshare.com/articles/journal_contribution/6611138 }}</ref>▼ ▲Language models are useful for a variety of tasks, including [[speech recognition]]<ref>Kuhn, Roland, and Renato De Mori (1990). [https://www.researchgate.net/profile/Roland_Kuhn2/publication/3191800_Cache-based_natural_language_model_for_speech_recognition/links/004635184ee5b2c24f000000.pdf "A cache-based natural language model for speech recognition"]. ''IEEE transactions on pattern analysis and machine intelligence'' 12.6: 570–583.</ref> (helping prevent predictions of low-probability (e.g. nonsense) sequences), [[machine translation]],<ref name="Semantic parsing as machine translation">Andreas, Jacob, Andreas Vlachos, and Stephen Clark (2013). [https://www.aclweb.org/anthology/P13-2009 "Semantic parsing as machine translation"] {{Webarchive\|url=https://web.archive.org/web/20200815080932/https://www.aclweb.org/anthology/P13-2009/ \|date=15 August 2020 }}. Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers).</ref> [[natural language generation]] (generating more human-like text), [[optical character recognition]], [[route optimization]],<ref>{{cite journal \|last1=Liu \|first1=Yang \|last2=Wu \|first2=Fanyou \|last3=Liu \|first3=Zhiyuan \|last4=Wang \|first4=Kai \|last5=Wang \|first5=Feiyue \|last6=Qu \|first6=Xiaobo \|title=Can language models be used for real-world urban-delivery route optimization? \|journal=The Innovation \|date=2023 \|volume=4 \|issue=6 \|pages=100520 \|doi=10.1016/j.xinn.2023.100520 \|doi-access=free\|pmid=37869471 \|pmc=10587631 \|bibcode=2023Innov...400520L }}</ref> [[handwriting recognition]],<ref>Pham, Vu, et al (2014). [https://arxiv.org/abs/1312.4569 "Dropout improves recurrent neural networks for handwriting recognition"] {{Webarchive\|url=https://web.archive.org/web/20201111170554/https://arxiv.org/abs/1312.4569 \|date=11 November 2020 }}. 14th International Conference on Frontiers in Handwriting Recognition. IEEE.</ref> [[grammar induction]],<ref>Htut, Phu Mon, Kyunghyun Cho, and Samuel R. Bowman (2018). [https://arxiv.org/pdf/1808.10000.pdf?source=post_page--------------------------- "Grammar induction with neural language models: An unusual replication"] {{Webarchive\|url=https://web.archive.org/web/20220814010528/https://arxiv.org/pdf/1808.10000.pdf?source=post_page--------------------------- \|date=14 August 2022 }}. {{arXiv\|1808.10000}}.</ref> and [[information retrieval]].<ref name=ponte1998>{{cite conference \|first1=Jay M. \|last1=Ponte \|first2= W. Bruce \|last2=Croft \| title= A language modeling approach to information retrieval \|conference=Proceedings of the 21st ACM SIGIR Conference \|year=1998 \|publisher=ACM \|place=Melbourne, Australia \| pages = 275–281\| doi=10.1145/290941.291008}}</ref><ref name=hiemstra1998>{{cite conference \| first=Djoerd \| last=Hiemstra \| year = 1998 \| title = A linguistically motivated probabilistically model of information retrieval \| conference = Proceedings of the 2nd European conference on Research and Advanced Technology for Digital Libraries \| publisher = LNCS, Springer \| pages=569–584 \| doi= 10.1007/3-540-49653-X_34}}</ref> [[Large language model]]s, currently their most advanced form, are a combination of larger datasets (frequently using words [[Web scraping\|scraped]] from the public [[internet]]), [[feedforward neural network]]s, and [[transformer (machine learning)\|transformer]]s. They have superseded [[recurrent neural network]]-based models, which had previously superseded the pure statistical models, such as [[Word n-gram language model\|word ''n''-gram language model]]. == History == [[Noam Chomsky]] did early work on language models by developing a theory of [[Formal grammar\|formal grammars]], which are also fundamental to the field of [[Programming language\|programming languages]].<ref>{{Cite journal \|last=Chomsky \|first=N. \|date=1956-09 \|title=Three models for the description of language \|url=https://ieeexplore.ieee.org/document/1056813 \|journal=IRE Transactions on Information Theory \|volume=2 \|issue=3 \|pages=113–124 \|doi=10.1109/TIT.1956.1056813 \|issn=2168-2712}}</ref> Later, statistical approaches based on discrete representations were found to be more useful for many purposes than rule-based formal grammars. In the 2000s, continuous representations for words, such as [[Word2vec]] began to replace discrete representations.<ref>{{Cite news \|date=2022-02-22 \|title=The Nature Of Life, The Nature Of Thinking: Looking Back On Eugene Charniak’s Work And Life \|url=https://cs.brown.edu/news/2022/02/22/the-nature-of-life-the-nature-of-thinking-looking-back-on-eugene-charniaks-work-and-life/ \|archive-url=http://web.archive.org/web/20241103134558/https://cs.brown.edu/news/2022/02/22/the-nature-of-life-the-nature-of-thinking-looking-back-on-eugene-charniaks-work-and-life/ \|archive-date=2024-11-03 \|access-date=2025-02-05 \|language=en}}</ref> == Pure statistical models == ▲A '''language model''' is a probabilistic [[Model#Conceptual model\|model]] of a natural language.<ref>{{cite book \|last1=Jurafsky \|first1=Dan \|last2=Martin \|first2=James H. \|title=Speech and Language Processing \|date=2021 \|edition=3rd \|url=https://web.stanford.edu/~jurafsky/slp3/ \|access-date=24 May 2022 \|chapter=N-gram Language Models \|archive-date=22 May 2022 \|archive-url=https://web.archive.org/web/20220522005855/https://web.stanford.edu/~jurafsky/slp3/ \|url-status=live }}</ref> In 1980, the first significant statistical language model was proposed, and during the decade IBM performed ‘[[Claude Shannon\|Shannon]]-style’ experiments, in which potential sources for language modeling improvement were identified by observing and analyzing the performance of human subjects in predicting or correcting text.<ref>{{cite journal \|last1=Rosenfeld \|first1=Ronald \|year=2000 \|title=Two decades of statistical language modeling: Where do we go from here? \|url=https://figshare.com/articles/journal_contribution/6611138 \|journal=Proceedings of the IEEE \|volume=88 \|issue=8 \|pages=1270–1278 \|doi=10.1109/5.880083 \|s2cid=10959945 ~~\|url=https://figshare.com/articles/journal_contribution/6611138~~ }}</ref> === Models based on word ''n''-grams ===

Language model: Difference between revisions