Content deleted Content added
Tag: Reverted |
m Reverted 2 edits by EdwinDareck234 (talk) to last revision by InternetArchiveBot |
||
(32 intermediate revisions by 24 users not shown) | |||
Line 1:
{{Short description|
{{Multiple issues|
{{More citations needed|date=May 2024}}
{{Cleanup rewrite|date=July 2025}}
{{Cleanup reorganize|date=July 2025}}
}}
'''Natural language processing''' (NLP) is the processing of [[natural language]] information by a [[computer]]. The study of NLP, a subfield of [[computer science]], is generally associated with [[artificial intelligence]]. NLP is related to [[information retrieval]], [[knowledge representation]], [[computational linguistics]], and more broadly with [[linguistics]].<ref name="nlpintro">
{{cite book |last=Eisenstein |first=Jacob |date=October 1, 2019 |title=Introduction to Natural Language Processing |url=https://mitpress.mit.edu/9780262042840/introduction-to-natural-language-processing/ |___location= |publisher=The MIT Press |page=1 |isbn=9780262042840 |access-date=}}</ref>
Major processing tasks in
== History ==
Line 15 ⟶ 18:
The premise of symbolic NLP is well-summarized by [[John Searle]]'s [[Chinese room]] experiment: Given a collection of rules (e.g., a Chinese phrasebook, with questions and matching answers), the computer emulates natural language understanding (or other NLP tasks) by applying those rules to the data it confronts.
* '''1950s''': The [[Georgetown-IBM experiment|Georgetown experiment]] in 1954 involved fully [[automatic translation]] of more than sixty Russian sentences into English. The authors claimed that within three or five years, machine translation would be a solved problem.<ref>{{cite web|author=Hutchins, J.|year=2005|url=http://www.hutchinsweb.me.uk/Nutshell-2005.pdf|title=The history of machine translation in a nutshell|access-date=2019-02-04|archive-date=2019-07-13|archive-url=https://web.archive.org/web/20190713103044/http://www.hutchinsweb.me.uk/Nutshell-2005.pdf|url-status=dead}}{{self-published source|date=December 2013}}</ref> However, real progress was much slower, and after the [[ALPAC|ALPAC report]] in 1966, which found that ten years of research had failed to fulfill the expectations, funding for machine translation was dramatically reduced. Little further research in machine translation was conducted in America (though some research continued elsewhere, such as Japan and Europe<ref>"ALPAC: the (in)famous report", John Hutchins, MT News International, no. 14, June 1996, pp. 9–12.</ref>) until the late 1980s when the first [[statistical machine translation]] systems were developed.
* '''1960s''': Some notably successful natural language processing systems developed in the 1960s were [[SHRDLU]], a natural language system working in restricted "[[blocks world]]s" with restricted vocabularies, and [[ELIZA]], a simulation of a [[Rogerian psychotherapy|Rogerian psychotherapist]], written by [[Joseph Weizenbaum]] between 1964 and 1966. Using almost no information about human thought or emotion, ELIZA sometimes provided a startlingly human-like interaction. When the "patient" exceeded the very small knowledge base, ELIZA might provide a generic response, for example, responding to "My head hurts" with "Why do you say your head hurts?".
* '''1970s''': During the 1970s, many programmers began to write "conceptual [[ontology (information science)|ontologies]]", which structured real-world information into computer-understandable data. Examples are MARGIE (Schank, 1975), SAM (Cullingford, 1978), PAM (Wilensky, 1978), TaleSpin (Meehan, 1976), QUALM (Lehnert, 1977), Politics (Carbonell, 1979), and Plot Units (Lehnert 1981). During this time, the first [[chatterbots]] were written (e.g., [[PARRY]]).
* '''1980s''': The 1980s and early 1990s mark the heyday of symbolic methods in NLP. Focus areas of the time included research on rule-based parsing (e.g., the development of [[Head-driven phrase structure grammar|HPSG]] as a computational operationalization of [[generative grammar]]), morphology (e.g., two-level morphology<ref>{{citation|last=Koskenniemi|first=Kimmo|title=Two-level morphology: A general computational model of word-form recognition and production|url=http://www.ling.helsinki.fi/~koskenni/doc/Two-LevelMorphology.pdf|year=1983|publisher=Department of General Linguistics, [[University of Helsinki]]|author-link=Kimmo Koskenniemi|access-date=2020-08-20|archive-date=2018-12-21|archive-url=https://web.archive.org/web/20181221032913/http://www.ling.helsinki.fi/~koskenni/doc/Two-LevelMorphology.pdf|url-status=dead}}</ref>), semantics (e.g., [[Lesk algorithm]]), reference (e.g., within Centering Theory<ref>Joshi, A. K., & Weinstein, S. (1981, August). [https://www.ijcai.org/Proceedings/81-1/Papers/071.pdf Control of Inference: Role of Some Aspects of Discourse Structure-Centering]. In ''IJCAI'' (pp. 385–387).</ref>) and other areas of natural language understanding (e.g., in the [[Rhetorical structure theory|Rhetorical Structure Theory]]). Other lines of research were continued, e.g., the development of chatterbots with [[Racter]] and [[Jabberwacky]]. An important development (that eventually led to the statistical turn in the 1990s) was the rising importance of quantitative evaluation in this period.<ref>{{Cite journal|last1=Guida|first1=G.|last2=Mauri|first2=G.|date=July 1986|title=Evaluation of natural language processing systems: Issues and approaches|journal=Proceedings of the IEEE|volume=74|issue=7|pages=1026–1035|doi=10.1109/PROC.1986.13580|s2cid=30688575|issn=1558-2256}}</ref>
=== Statistical NLP (
Up until the 1980s, most natural language processing systems were based on complex sets of hand-written rules. Starting in the late 1980s, however, there was a revolution in natural language processing with the introduction of [[machine learning]] algorithms for language processing. This was due to both the steady increase in computational power (see [[Moore's law]]) and the gradual lessening of the dominance of [[Noam Chomsky|Chomskyan]] theories of linguistics (e.g. [[transformational grammar]]), whose theoretical underpinnings discouraged the sort of [[corpus linguistics]] that underlies the machine-learning approach to language processing.<ref>Chomskyan linguistics encourages the investigation of "[[corner case]]s" that stress the limits of its theoretical models (comparable to [[pathological (mathematics)|pathological]] phenomena in mathematics), typically created using [[thought experiment]]s, rather than the systematic investigation of typical phenomena that occur in real-world data, as is the case in [[corpus linguistics]]. The creation and use of such [[text corpus|corpora]] of real-world data is a fundamental part of machine-learning algorithms for natural language processing. In addition, theoretical underpinnings of Chomskyan linguistics such as the so-called "[[poverty of the stimulus]]" argument entail that general learning algorithms, as are typically used in machine learning, cannot be successful in language processing. As a result, the Chomskyan paradigm discouraged the application of such models to language processing.</ref>
*'''1990s''': Many of the notable early successes in statistical methods in NLP occurred in the field of [[machine translation]], due especially to work at IBM Research, such as [[IBM alignment models]]. These systems were able to take advantage of existing multilingual [[text corpus|textual corpora]] that had been produced by the [[Parliament of Canada]] and the [[European Union]] as a result of laws calling for the translation of all governmental proceedings into all official languages of the corresponding systems of government. However, most other systems depended on corpora specifically developed for the tasks implemented by these systems, which was (and often continues to be) a major limitation in the success of these systems. As a result, a great deal of research has gone into methods of more effectively learning from limited amounts of data.
*'''2000s''': With the growth of the web, increasing amounts of raw (unannotated) language data have become available since the mid-1990s. Research has thus increasingly focused on [[unsupervised learning|unsupervised]] and [[semi-supervised learning]] algorithms. Such algorithms can learn from data that has not been hand-annotated with the desired answers or using a combination of annotated and non-annotated data. Generally, this task is much more difficult than [[supervised learning]], and typically produces less accurate results for a given amount of input data. However, there is an enormous amount of non-annotated data available (including, among other things, the entire content of the [[World Wide Web]]), which can often make up for the
▲In 2003, [[word n-gram language model|word n-gram model]], at the time the best statistical algorithm, was outperformed by a [[multi-layer perceptron]] (with a single hidden layer and context length of several words trained on up to 14 million of words with a CPU cluster in [[language model]]ling) by [[Yoshua Bengio]] with co-authors.<ref>{{Cite journal|url=https://dl.acm.org/doi/10.5555/944919.944966|title=A neural probabilistic language model|first1=Yoshua|last1=Bengio|first2=Réjean|last2=Ducharme|first3=Pascal|last3=Vincent|first4=Christian|last4=Janvin|date=March 1, 2003|journal=The Journal of Machine Learning Research|volume=3|pages=1137–1155|via=ACM Digital Library}}</ref>
▲In 2010, [[Tomáš Mikolov]] (then a PhD student at [[Bruno University of Technology]]) with co-authors applied a simple [[recurrent neural network]] with a single hidden layer to language modelling,<ref>{{cite book |last1=Mikolov |first1=Tomáš |last2=Karafiát |first2=Martin |last3=Burget |first3=Lukáš |last4=Černocký |first4=Jan |last5=Khudanpur |first5=Sanjeev |title=Interspeech 2010 |chapter=Recurrent neural network based language model |journal=Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010 |date=26 September 2010 |pages=1045–1048 |doi=10.21437/Interspeech.2010-343 |s2cid=17048224 |chapter-url=https://gwern.net/doc/ai/nn/rnn/2010-mikolov.pdf |language=en}}</ref> and in the following years he went on to develop [[Word2vec]]. In the 2010s, [[representation learning]] and [[deep learning|deep neural network]]-style (featuring many hidden layers) machine learning methods became widespread in natural language processing. That popularity was due partly to a flurry of results showing that such techniques<ref name="goldberg:nnlp17">{{cite journal |last=Goldberg |first=Yoav |year=2016 |arxiv=1807.10854 |title=A Primer on Neural Network Models for Natural Language Processing |journal=Journal of Artificial Intelligence Research |volume=57 |pages=345–420 |doi=10.1613/jair.4992 |s2cid=8273530 }}</ref><ref name="goodfellow:book16">{{cite book |first1=Ian |last1=Goodfellow |first2=Yoshua |last2=Bengio |first3=Aaron |last3=Courville |url=http://www.deeplearningbook.org/ |title=Deep Learning |publisher=MIT Press |year=2016 }}</ref> can achieve state-of-the-art results in many natural language tasks, e.g., in [[language modeling]]<ref name="jozefowicz:lm16">{{cite book |first1=Rafal |last1=Jozefowicz |first2=Oriol |last2=Vinyals |first3=Mike |last3=Schuster |first4=Noam |last4=Shazeer |first5=Yonghui |last5=Wu |year=2016 |arxiv=1602.02410 |title=Exploring the Limits of Language Modeling |bibcode=2016arXiv160202410J }}</ref> and parsing.<ref name="choe:emnlp16">{{cite journal |first1=Do Kook |last1=Choe |first2=Eugene |last2=Charniak |journal=Emnlp 2016 |url=https://aclanthology.coli.uni-saarland.de/papers/D16-1257/d16-1257 |title=Parsing as Language Modeling |access-date=2018-10-22 |archive-date=2018-10-23 |archive-url=https://web.archive.org/web/20181023034804/https://aclanthology.coli.uni-saarland.de/papers/D16-1257/d16-1257 |url-status=dead }}</ref><ref name="vinyals:nips15">{{cite journal |last1=Vinyals |first1=Oriol |last2=Kaiser |first2=Lukasz |display-authors=1 |journal=Nips2015 |title=Grammar as a Foreign Language |year=2014 |arxiv=1412.7449 |bibcode=2014arXiv1412.7449V |url=https://papers.nips.cc/paper/5635-grammar-as-a-foreign-language.pdf }}</ref> This is increasingly important [[artificial intelligence in healthcare|in medicine and healthcare]], where NLP helps analyze notes and text in [[Electronic health record|electronic health records]] that would otherwise be inaccessible for study when seeking to improve care<ref>{{Cite journal|last1=Turchin|first1=Alexander|last2=Florez Builes|first2=Luisa F.|date=2021-03-19|title=Using Natural Language Processing to Measure and Improve Quality of Diabetes Care: A Systematic Review|journal=Journal of Diabetes Science and Technology|volume=15|issue=3|language=en|pages=553–560|doi=10.1177/19322968211000831|pmid=33736486|pmc=8120048|issn=1932-2968}}</ref> or protect patient privacy.<ref>{{Cite journal |last1=Lee |first1=Jennifer |last2=Yang |first2=Samuel |last3=Holland-Hall |first3=Cynthia |last4=Sezgin |first4=Emre |last5=Gill |first5=Manjot |last6=Linwood |first6=Simon |last7=Huang |first7=Yungui |last8=Hoffman |first8=Jeffrey |date=2022-06-10 |title=Prevalence of Sensitive Terms in Clinical Notes Using Natural Language Processing Techniques: Observational Study |journal=JMIR Medical Informatics |language=en |volume=10 |issue=6 |pages=e38482 |doi=10.2196/38482 |issn=2291-9694 |pmc=9233261 |pmid=35687381 |doi-access=free }}</ref>
==Approaches: Symbolic, statistical, neural networks{{anchor|Statistical natural language processing (SNLP)}} ==
Line 42:
* the larger such a (probabilistic) language model is, the more accurate it becomes, in contrast to rule-based systems that can gain accuracy only by increasing the amount and complexity of the rules leading to [[intractable problem|intractability]] problems.
Rule-based systems are commonly used:
* when the amount of training data is insufficient to successfully apply machine learning methods, e.g., for the machine translation of low-resource languages such as provided by the [[Apertium]] system,
* for preprocessing in NLP pipelines, e.g., [[Tokenization (lexical analysis)|tokenization]], or
Line 57 ⟶ 56:
=== Neural networks ===
{{Further|Artificial neural network}}
A major drawback of statistical methods is that they require elaborate [[feature engineering]]. Since 2015,<ref>{{Cite web |last=Socher |first=Richard |title=Deep Learning For NLP-ACL 2012 Tutorial |url=https://www.socher.org/index.php/Main/DeepLearningForNLP-ACL2012Tutorial |access-date=2020-08-17 |website=www.socher.org |archive-date=2021-04-14 |archive-url=https://web.archive.org/web/20210414054126/https://www.socher.org/index.php/Main/DeepLearningForNLP-ACL2012Tutorial |url-status=dead }} This was an early Deep Learning tutorial at the ACL 2012 and met with both interest and (at the time) skepticism by most participants. Until then, neural learning was basically rejected because of its lack of statistical interpretability. Until 2015, deep learning had evolved into the major framework of NLP. [Link is broken, try http://web.stanford.edu/class/cs224n/]</ref> the statistical approach has been replaced by the [[Artificial neural network|neural networks]] approach, using [[
Intermediate tasks (e.g., part-of-speech tagging and dependency parsing) are not needed anymore.
[[Neural machine translation]], based on then-newly
== Common NLP tasks ==
Line 129 ⟶ 128:
; [[Argument mining]]
:The goal of argument mining is the automatic extraction and identification of argumentative structures from [[natural language]] text with the aid of computer programs.<ref>{{Cite journal|last1=Lippi|first1=Marco|last2=Torroni|first2=Paolo|date=2016-04-20|title=Argumentation Mining: State of the Art and Emerging Trends|url=https://dl.acm.org/doi/10.1145/2850417|journal=ACM Transactions on Internet Technology|language=en|volume=16|issue=2|pages=1–25|doi=10.1145/2850417|hdl=11585/523460|s2cid=9561587|issn=1533-5399|hdl-access=free}}</ref> Such argumentative structures include the premise, conclusions, the [[argument scheme]] and the relationship between the main and subsidiary argument, or the main and counter-argument within discourse.<ref>{{Cite web|title=Argument Mining – IJCAI2016 Tutorial|url=https://www.i3s.unice.fr/~villata/tutorialIJCAI2016.html|access-date=2021-03-09|website=www.i3s.unice.fr|archive-date=2021-04-18|archive-url=https://web.archive.org/web/20210418083659/https://www.i3s.unice.fr/~villata/tutorialIJCAI2016.html|url-status=dead}}</ref><ref>{{Cite web|title=NLP Approaches to Computational Argumentation – ACL 2016, Berlin|url=http://acl2016tutorial.arg.tech/|access-date=2021-03-09|language=en-GB}}</ref>
=== Higher-level NLP applications ===
Line 139 ⟶ 138:
; [[Machine translation]] (MT)
:Automatically translate text from one human language to another. This is one of the most difficult problems, and is a member of a class of problems colloquially termed "[[AI-complete]]", i.e. requiring all of the different types of knowledge that humans possess (grammar, semantics, facts about the real world, etc.) to solve properly.
; [[Natural
; [[Natural
:Convert information from computer databases or semantic intents into readable human language.
; Book generation
Line 168 ⟶ 167:
# Apply the theory of [[conceptual metaphor]], explained by Lakoff as "the understanding of one idea, in terms of another" which provides an idea of the intent of the author.<ref>{{Cite book|title=A Cognitive Theory of Cultural Meaning|last= Strauss |first= Claudia |publisher= Cambridge University Press|year=1999|isbn=978-0-521-59541-4|pages=156–164}}</ref> For example, consider the English word ''big''. When used in a comparison ("That is a big tree"), the author's intent is to imply that the tree is ''physically large'' relative to other trees or the authors experience. When used metaphorically ("Tomorrow is a big day"), the author's intent to imply ''importance''. The intent behind other usages, like in "She is a big person", will remain somewhat ambiguous to a person and a cognitive NLP algorithm alike without additional information.
# Assign relative measures of meaning to a word, phrase, sentence or piece of text based on the information presented before and after the piece of text being analyzed, e.g., by means of a [[probabilistic context-free grammar]] (PCFG). The mathematical equation for such algorithms is presented in [https://worldwide.espacenet.com/patent/search/family/055314712/publication/US9269353B1?q=pn%3DUS9269353 US Patent 9269353] {{Webarchive|url=https://web.archive.org/web/20240516102600/https://worldwide.espacenet.com/patent/search/family/055314712/publication/US9269353B1?q=pn=US9269353 |date=2024-05-16 }}:<ref>{{cite patent |country=US |number=9269353|status=patent}}</ref>
::<math> {RMM(token_N)}
=
|