Natural language processing: Difference between revisions

Content deleted Content added
m Removed the 404 hyperlink from Context length, as it was ruining the user/reader experience.
m Reverted 2 edits by EdwinDareck234 (talk) to last revision by InternetArchiveBot
 
(6 intermediate revisions by 5 users not shown)
Line 1:
{{Short description|FieldProcessing of linguisticsnatural andlanguage computerby sciencea computer}}
{{Multiple issues|
{{More citations needed|date=May 2024}}
{{Cleanup rewrite|date=July 2025}}
'''Natural language processing''' ('''NLP''') is a subfield of [[computer science]] and especially [[artificial intelligence]]. It is primarily concerned with providing computers with the ability to process data encoded in [[natural language]] and is thus closely related to [[information retrieval]], [[knowledge representation]] and [[computational linguistics]], a subfield of [[linguistics]].
{{Cleanup reorganize|date=July 2025}}
}}
'''Natural language processing''' ('''NLP''') is athe subfieldprocessing of [[computernatural sciencelanguage]] andinformation especiallyby a [[artificial intelligencecomputer]]. ItThe isstudy primarilyof concernedNLP, witha providingsubfield computersof with[[computer thescience]], abilityis togenerally processassociated data encoded inwith [[naturalartificial languageintelligence]]. andNLP is thus closely related to [[information retrieval]], [[knowledge representation]] and, [[computational linguistics]], aand subfieldmore ofbroadly with [[linguistics]].<ref name="nlpintro">
{{cite book |last=Eisenstein |first=Jacob |date=October 1, 2019 |title=Introduction to Natural Language Processing |url=https://mitpress.mit.edu/9780262042840/introduction-to-natural-language-processing/ |___location= |publisher=The MIT Press |page=1 |isbn=9780262042840 |access-date=}}</ref>
 
Major processing tasks in naturalan languageNLP processingsystem areinclude: [[speech recognition]], [[text classification]], [[natural-language understanding|natural language understanding]], and [[natural language generation]].
 
== History ==
Line 13 ⟶ 18:
The premise of symbolic NLP is well-summarized by [[John Searle]]'s [[Chinese room]] experiment: Given a collection of rules (e.g., a Chinese phrasebook, with questions and matching answers), the computer emulates natural language understanding (or other NLP tasks) by applying those rules to the data it confronts.
 
* '''1950s''': The [[Georgetown-IBM experiment|Georgetown experiment]] in 1954 involved fully [[automatic translation]] of more than sixty Russian sentences into English. The authors claimed that within three or five years, machine translation would be a solved problem.<ref>{{cite web|author=Hutchins, J.|year=2005|url=http://www.hutchinsweb.me.uk/Nutshell-2005.pdf|title=The history of machine translation in a nutshell|access-date=2019-02-04|archive-date=2019-07-13|archive-url=https://web.archive.org/web/20190713103044/http://www.hutchinsweb.me.uk/Nutshell-2005.pdf|url-status=dead}}{{self-published source|date=December 2013}}</ref> However, real progress was much slower, and after the [[ALPAC|ALPAC report]] in 1966, which found that ten years of research had failed to fulfill the expectations, funding for machine translation was dramatically reduced. Little further research in machine translation was conducted in America (though some research continued elsewhere, such as Japan and Europe<ref>"ALPAC: the (in)famous report", John Hutchins, MT News International, no. 14, June 1996, pp. 9–12.</ref>) until the late 1980s when the first [[statistical machine translation]] systems were developed.
* '''1960s''': Some notably successful natural language processing systems developed in the 1960s were [[SHRDLU]], a natural language system working in restricted "[[blocks world]]s" with restricted vocabularies, and [[ELIZA]], a simulation of a [[Rogerian psychotherapy|Rogerian psychotherapist]], written by [[Joseph Weizenbaum]] between 1964 and 1966. Using almost no information about human thought or emotion, ELIZA sometimes provided a startlingly human-like interaction. When the "patient" exceeded the very small knowledge base, ELIZA might provide a generic response, for example, responding to "My head hurts" with "Why do you say your head hurts?". Ross Quillian's successful work on natural language was demonstrated with a vocabulary of only ''twenty'' words, because that was all that would fit in a computer memory at the time.<ref>{{Harvnb|Crevier|1993|pp=146–148}}, see also {{Harvnb|Buchanan|2005|p=56}}: "Early programs were necessarily limited in scope by the size and speed of memory"</ref>
 
* '''1970s''': During the 1970s, many programmers began to write "conceptual [[ontology (information science)|ontologies]]", which structured real-world information into computer-understandable data. Examples are MARGIE (Schank, 1975), SAM (Cullingford, 1978), PAM (Wilensky, 1978), TaleSpin (Meehan, 1976), QUALM (Lehnert, 1977), Politics (Carbonell, 1979), and Plot Units (Lehnert 1981). During this time, the first [[chatterbots]] were written (e.g., [[PARRY]]).
* '''1980s''': The 1980s and early 1990s mark the heyday of symbolic methods in NLP. Focus areas of the time included research on rule-based parsing (e.g., the development of [[Head-driven phrase structure grammar|HPSG]] as a computational operationalization of [[generative grammar]]), morphology (e.g., two-level morphology<ref>{{citation|last=Koskenniemi|first=Kimmo|title=Two-level morphology: A general computational model of word-form recognition and production|url=http://www.ling.helsinki.fi/~koskenni/doc/Two-LevelMorphology.pdf|year=1983|publisher=Department of General Linguistics, [[University of Helsinki]]|author-link=Kimmo Koskenniemi|access-date=2020-08-20|archive-date=2018-12-21|archive-url=https://web.archive.org/web/20181221032913/http://www.ling.helsinki.fi/~koskenni/doc/Two-LevelMorphology.pdf|url-status=dead}}</ref>), semantics (e.g., [[Lesk algorithm]]), reference (e.g., within Centering Theory<ref>Joshi, A. K., & Weinstein, S. (1981, August). [https://www.ijcai.org/Proceedings/81-1/Papers/071.pdf Control of Inference: Role of Some Aspects of Discourse Structure-Centering]. In ''IJCAI'' (pp. 385–387).</ref>) and other areas of natural language understanding (e.g., in the [[Rhetorical structure theory|Rhetorical Structure Theory]]). Other lines of research were continued, e.g., the development of chatterbots with [[Racter]] and [[Jabberwacky]]. An important development (that eventually led to the statistical turn in the 1990s) was the rising importance of quantitative evaluation in this period.<ref>{{Cite journal|last1=Guida|first1=G.|last2=Mauri|first2=G.|date=July 1986|title=Evaluation of natural language processing systems: Issues and approaches|journal=Proceedings of the IEEE|volume=74|issue=7|pages=1026–1035|doi=10.1109/PROC.1986.13580|s2cid=30688575|issn=1558-2256}}</ref>
 
=== Statistical NLP (1990s–present) ===
Line 51 ⟶ 56:
=== Neural networks ===
{{Further|Artificial neural network}}
A major drawback of statistical methods is that they require elaborate [[feature engineering]]. Since 2015,<ref>{{Cite web |last=Socher |first=Richard |title=Deep Learning For NLP-ACL 2012 Tutorial |url=https://www.socher.org/index.php/Main/DeepLearningForNLP-ACL2012Tutorial |access-date=2020-08-17 |website=www.socher.org |archive-date=2021-04-14 |archive-url=https://web.archive.org/web/20210414054126/https://www.socher.org/index.php/Main/DeepLearningForNLP-ACL2012Tutorial |url-status=dead }} This was an early Deep Learning tutorial at the ACL 2012 and met with both interest and (at the time) skepticism by most participants. Until then, neural learning was basically rejected because of its lack of statistical interpretability. Until 2015, deep learning had evolved into the major framework of NLP. [Link is broken, try http://web.stanford.edu/class/cs224n/]</ref> the statistical approach has been replaced by the [[Artificial neural network|neural networks]] approach, using [[semantic networks]]<ref>{{cite book |last1=Segev |first1=Elad |title=Semantic Network Analysis in Social Sciences |date=2022 |publisher=Routledge |___location=London |isbn=9780367636524 |url=https://www.routledge.com/Semantic-Network-Analysis-in-Social-Sciences/Segev/p/book/9780367636524 |access-date=5 December 2021 |archive-date=5 December 2021 |archive-url=https://web.archive.org/web/20211205140726/https://www.routledge.com/Semantic-Network-Analysis-in-Social-Sciences/Segev/p/book/9780367636524 |url-status=live }}</ref> and [[word embedding]]s to capture semantic properties of words.
 
Intermediate tasks (e.g., part-of-speech tagging and dependency parsing) are not needed anymore.
Line 123 ⟶ 128:
 
; [[Argument mining]]
:The goal of argument mining is the automatic extraction and identification of argumentative structures from [[natural language]] text with the aid of computer programs.<ref>{{Cite journal|last1=Lippi|first1=Marco|last2=Torroni|first2=Paolo|date=2016-04-20|title=Argumentation Mining: State of the Art and Emerging Trends|url=https://dl.acm.org/doi/10.1145/2850417|journal=ACM Transactions on Internet Technology|language=en|volume=16|issue=2|pages=1–25|doi=10.1145/2850417|hdl=11585/523460|s2cid=9561587|issn=1533-5399|hdl-access=free}}</ref> Such argumentative structures include the premise, conclusions, the [[argument scheme]] and the relationship between the main and subsidiary argument, or the main and counter-argument within discourse.<ref>{{Cite web|title=Argument Mining – IJCAI2016 Tutorial|url=https://www.i3s.unice.fr/~villata/tutorialIJCAI2016.html|access-date=2021-03-09|website=www.i3s.unice.fr|archive-date=2021-04-18|archive-url=https://web.archive.org/web/20210418083659/https://www.i3s.unice.fr/~villata/tutorialIJCAI2016.html|url-status=dead}}</ref><ref>{{Cite web|title=NLP Approaches to Computational Argumentation – ACL 2016, Berlin|url=http://acl2016tutorial.arg.tech/|access-date=2021-03-09|language=en-GB}}</ref>
 
=== Higher-level NLP applications ===
Line 162 ⟶ 167:
 
# Apply the theory of [[conceptual metaphor]], explained by Lakoff as "the understanding of one idea, in terms of another" which provides an idea of the intent of the author.<ref>{{Cite book|title=A Cognitive Theory of Cultural Meaning|last= Strauss |first= Claudia |publisher= Cambridge University Press|year=1999|isbn=978-0-521-59541-4|pages=156–164}}</ref> For example, consider the English word ''big''. When used in a comparison ("That is a big tree"), the author's intent is to imply that the tree is ''physically large'' relative to other trees or the authors experience. When used metaphorically ("Tomorrow is a big day"), the author's intent to imply ''importance''. The intent behind other usages, like in "She is a big person", will remain somewhat ambiguous to a person and a cognitive NLP algorithm alike without additional information.
# Assign relative measures of meaning to a word, phrase, sentence or piece of text based on the information presented before and after the piece of text being analyzed, e.g., by means of a [[probabilistic context-free grammar]] (PCFG). The mathematical equation for such algorithms is presented in [https://worldwide.espacenet.com/patent/search/family/055314712/publication/US9269353B1?q=pn%3DUS9269353 US Patent 9269353] {{Webarchive|url=https://web.archive.org/web/20240516102600/https://worldwide.espacenet.com/patent/search/family/055314712/publication/US9269353B1?q=pn=US9269353 |date=2024-05-16 }}:<ref>{{cite patent |country=US |number=9269353|status=patent}}</ref>
::<math> {RMM(token_N)}
=