Content deleted Content added
±Category:History of artificial intelligence; ±Category:History of linguistics; ±Category:History of software; ±Category:Software topical history overviews using HotCat |
|||
(48 intermediate revisions by 31 users not shown) | |||
Line 1:
{{
{{Short description|none}}
The '''history of natural language processing''' describes the advances of [[natural language processing]]. There is some overlap with the [[history of machine translation]], the [[history of speech recognition]], and the [[history of artificial intelligence]].
==
The history of machine translation dates back to the seventeenth century, when philosophers such as [[Gottfried Wilhelm Leibniz|Leibniz]] and [[Descartes]] put forward proposals for codes which would relate words between languages. All of these proposals remained theoretical, and none resulted in the development of an actual machine.
The first patents for "translating machines" were applied for in the mid
| last1 = Hutchins
| first1 = John
| last2 = Lovtskii
| first2 = Evgenii
| year = 2000
| title = Petr Petrovich Troyanskii (1894-1950): A Forgotten Pioneer of Mechanical Translation
| publisher =
| publication-place =Machine Translation
| page =
| url =https://www.jstor.org/stable/40009018
| access-date =
}}</ref>
== Logical period ==
In 1950, [[Alan Turing]] published his famous article "[[Computing Machinery and Intelligence]]"
In 1957, [[Noam Chomsky]]’s ''[[Syntactic Structures]]'' revolutionized Linguistics with '[[universal grammar]]', a rule
| url = http://www.cs.bham.ac.uk/~pjh/sem1a5/pt1/pt1_history.html
| title = SEM1A5 - Part 1 - A brief history of NLP
Line 15 ⟶ 29:
}}</ref>
Some notably successful NLP systems developed in the 1960s were [[SHRDLU]], a natural language system working in restricted "[[blocks world]]s" with restricted vocabularies.
In 1969 [[Roger Schank]] introduced the [[conceptual dependency theory]] for natural language understanding.<ref>[[Roger Schank]], 1969, ''A conceptual dependency parser for natural language'' Proceedings of the 1969 conference on Computational linguistics, Sång-Säby, Sweden, pages 1-3</ref> This model, partially influenced by the work of [[Sydney Lamb]], was extensively used by Schank's students at [[Yale University]], such as Robert Wilensky, Wendy Lehnert, and [[Janet Kolodner]].
In 1970, William A. Woods introduced the [[augmented transition network]] (ATN) to represent natural language input.<ref>Woods, William A (1970). "Transition Network Grammars for Natural Language Analysis". Communications of the ACM 13 (10): 591–606 [http://www.eric.ed.gov/ERICWebPortal/custom/portlets/recordDetails/detailmini.jsp?_nfpb=true&_&ERICExtSearch_SearchValue_0=ED037733&ERICExtSearch_SearchType_0=no&accno=ED037733]</ref> Instead of ''[[phrase structure rules]]'' ATNs used an equivalent set of [[finite
== Statistical period ==
{{anchor|Machine learning}}
Up to the 1980s, most NLP systems were based on complex sets of hand-written rules. Starting in the late 1980s, however, there was a revolution in NLP with the introduction of [[machine learning]] algorithms for language processing. This was due both to the steady increase in computational power resulting from [[Moore's law]] and the gradual lessening of the dominance of [[Noam Chomsky|Chomskyan]] theories of linguistics (e.g. [[transformational grammar]]), whose theoretical underpinnings discouraged the sort of [[corpus linguistics]] that underlies the machine-learning approach to language processing.<ref>Chomskyan linguistics encourages the investigation of "[[corner case]]s" that stress the limits of its theoretical models (comparable to [[pathological (mathematics)|pathological]] phenomena in mathematics), typically created using [[thought experiment]]s, rather than the systematic investigation of typical phenomena that occur in real-world data, as is the case in [[corpus linguistics]]. The creation and use of such [[text corpus|corpora]] of real-world data is a fundamental part of machine-learning algorithms for NLP. In addition, theoretical underpinnings of Chomskyan linguistics such as the so-called "[[poverty of the stimulus]]" argument entail that general learning algorithms, as are typically used in machine learning, cannot be successful in language processing. As a result, the Chomskyan paradigm discouraged the application of such models to language processing.</ref> Some of the earliest-used machine learning algorithms, such as [[decision tree]]s, produced systems of hard if-then rules similar to existing hand-written rules. Increasingly, however, research has focused on [[statistical natural language processing|statistical models]], which make soft, [[probabilistic]] decisions based on attaching [[real-valued]] weights to the features making up the input data. The [[cache language model]]s upon which many [[speech recognition]] systems now rely are examples of such statistical models. Such models are generally more robust when given unfamiliar input, especially input that contains errors (as is very common for real-world data), and produce more reliable results when integrated into a larger system comprising multiple subtasks.
=== Datasets ===
The emergence of statistical approaches was aided by both increase in computing power and the availability of large datasets. At that time, large multilingual corpora were starting to emerge. Notably, some were produced by the [[Parliament of Canada]] and the [[European Union]] as a result of laws calling for the translation of all governmental proceedings into all official languages of the corresponding systems of government.
Many of the notable early successes occurred in the field of [[machine translation]]. In 1993, the [[IBM alignment models]] were used for [[statistical machine translation]].<ref name="U4RiN">{{cite journal |last1=Brown |first1=Peter F. |year=1993 |title=The mathematics of statistical machine translation: Parameter estimation |journal=Computational Linguistics |issue=19 |pages=263–311}}</ref> Compared to previous machine translation systems, which were symbolic systems manually coded by computational linguists, these systems were statistical, which allowed them to automatically learn from large [[text corpus|textual corpora]]. Though these systems do not work well in situations where only small corpora is available, so data-efficient methods continue to be an area of research and development.
In 2001, a one-billion-word large text corpus, scraped from the Internet, referred to as "very very large" at the time, was used for word [[Word-sense disambiguation|disambiguation]].<ref name="2001_very_very_large_corpora">{{cite journal |last1=Banko |first1=Michele |last2=Brill |first2=Eric |date=2001 |title=Scaling to very very large corpora for natural language disambiguation |journal=Proceedings of the 39th Annual Meeting on Association for Computational Linguistics - ACL '01 |___location=Morristown, NJ, USA |publisher=Association for Computational Linguistics |pages=26–33 |doi=10.3115/1073012.1073017 |s2cid=6645623 |doi-access=free}}</ref>
To take advantage of large, unlabelled datasets, algorithms were developed for [[Unsupervised learning|unsupervised]] and [[self-supervised learning]]. Generally, this task is much more difficult than [[supervised learning]], and typically produces less accurate results for a given amount of input data. However, there is an enormous amount of non-annotated data available (including, among other things, the entire content of the [[World Wide Web]]), which can often make up for the inferior results.
== Neural period ==
[[File:A_development_of_natural_language_processing_tools.png|thumb|Timeline of natural language processing models]]
Neural [[Language Model|language models]] were developed in 1990s. In 1990, the [[Recurrent neural network#Elman networks and Jordan networks|Elman network]], using a [[recurrent neural network]], encoded each word in a training set as a vector, called a [[word embedding]], and the whole vocabulary as a [[vector database]], allowing it to perform such tasks as sequence-predictions that are beyond the power of a simple [[multilayer perceptron]]. A shortcoming of the static embeddings was that they didn't differentiate between multiple meanings of [[Homonym|homonyms]].<ref name="1990_ElmanPaper">{{cite journal |last=Elman |first=Jeffrey L. |date=March 1990 |title=Finding Structure in Time |url=http://doi.wiley.com/10.1207/s15516709cog1402_1 |journal=Cognitive Science |volume=14 |issue=2 |pages=179–211 |doi=10.1207/s15516709cog1402_1 |s2cid=2763403|url-access=subscription }}</ref>
Yoshua Bengio developed the first neural probabilistic language model in 2000 <ref>{{Citation
| last = Bengio
| first = Yoshua
| author-link = Yoshua Bengio
| title = A Neural Probabilistic Language Model
| place = Montreal, Canada
| publisher = Journal of Machine Learning Research
| series = —
| volume = 3
| edition = —
| date = 2003
| page = 1137–1155
| doi = 10.1162/153244303322533223
| doi-access = free
}}</ref>
In recent years, advancements in deep learning and large language models have significantly enhanced the capabilities of natural language processing, leading to widespread applications in areas such as healthcare, customer service, and content generation.<ref>{{Cite news |last=Gruetzemacher |first=Ross |date=2022-04-19 |title=The Power of Natural Language Processing |url=https://hbr.org/2022/04/the-power-of-natural-language-processing |access-date=2024-12-07 |work=Harvard Business Review |issn=0017-8012}}</ref>
==Software==
{| class="wikitable"
! style="background-color:#ECE9EF;" | Software
! style="background-color:#FFF6D6;" | Year
Line 30 ⟶ 81:
! style="background-color:#EEF6D6;" | Reference
|-
|'''[[
|1954
|[[Georgetown University]] and [[IBM]]
Line 124 ⟶ 175:
|
|-
|'''[[MOPTRANS]] '''<ref>Janet L. Kolodner, Christopher K. Riesbeck; ''Experience, Memory, and Reasoning''; Psychology Press; 2014 reprint</ref>
|'''[[MOPTRANS]] '''▼
|1984
|Lytinen
Line 137 ⟶ 188:
|1987
|Hirst
|
|-
|'''[[Dr. Sbaitso]] '''
|1991
|[[Creative Labs]]
|
|-
Line 143 ⟶ 199:
|[[IBM]]
|A question answering system that won the [[Jeopardy!]] contest, defeating the best human players in February 2011.
|-
|2011
|[[Apple_Inc.|Apple]]
|A virtual assistant developed by Apple.
|-
|'''[[Cortana (virtual assistant)|Cortana]] '''
|2014
|[[Microsoft]]
|A virtual assistant developed by Microsoft.
|-
|'''[[Amazon Alexa]] '''
|2014
|[[Amazon_(company)|Amazon]]
|A virtual assistant developed by Amazon.
|-
|'''[[Google Assistant]] '''
|2016
|[[Google]]
|A virtual assistant developed by Google.
|
|}
Line 149 ⟶ 225:
{{Reflist}}
==Bibliography==
[[Category:History of artificial intelligence]]▼
* {{Crevier 1993}}
* {{Citation | last=McCorduck | first=Pamela | title = Machines Who Think | year = 2004 | edition=2nd | ___location=Natick, MA | publisher=A. K. Peters, Ltd. | isbn=978-1-56881-205-2 | oclc=52197627}}.
* {{Russell Norvig 2003}}.
▲[[Category:History of artificial intelligence|natural language processing]]
[[Category:Natural language processing]]
[[Category:History of linguistics|natural language processing]]
[[Category:History of software|natural language processing]]
[[Category:Software topical history overviews|natural language processing]]
|