Content deleted Content added
Tags: Reverted Mobile edit Mobile app edit Android app edit App section source |
Tags: Reverted references removed Mobile edit Mobile app edit Android app edit App section source |
||
Line 13:
{{See|History of natural language processing}}
Natural language processing has its roots in the 1950s.<ref>{{Cite web |title=NLP |url=https://cs.stanford.edu/people/eroberts/courses/soco/projects/2004-05/nlp/overview_history.html}}</
=== Symbolic NLP (1950s – early 1990s) ===
The premise of symbolic NLP is well-summarized by [[John Searle]]'s [[Chinese room]] experiment: Given a collection of rules (e.g., a Chinese phrasebook, with questions and matching answers), the computer emulates natural language understanding (or other NLP tasks) by applying those rules to the data it confronts.
* '''1950s''': The [[Georgetown-IBM experiment|Georgetown experiment]] in 1954 involved fully [[automatic translation]] of more than sixty Russian sentences into English. The authors claimed that within three or five years, machine translation would be a solved problem.<
* '''1960s''': Some notably successful natural language processing systems developed in the 1960s were [[SHRDLU]], a natural language system working in restricted "[[blocks world]]s" with restricted vocabularies, and [[ELIZA]], a simulation of a [[Rogerian psychotherapy|Rogerian psychotherapist]], written by [[Joseph Weizenbaum]] between 1964 and 1966. Using almost no information about human thought or emotion, ELIZA sometimes provided a startlingly human-like interaction. When the "patient" exceeded the very small knowledge base, ELIZA might provide a generic response, for example, responding to "My head hurts" with "Why do you say your head hurts?". Ross Quillian's successful work on natural language was demonstrated with a vocabulary of only ''twenty'' words, because that was all that would fit in a computer memory at the time.<ref>{{Harvnb|Crevier|1993|pp=146–148}}, see also {{Harvnb|Buchanan|2005|p=56}}: "Early programs were necessarily limited in scope by the size and speed of memory"</ref>
Line 29:
*'''2000s''': With the growth of the web, increasing amounts of raw (unannotated) language data have become available since the mid-1990s. Research has thus increasingly focused on [[unsupervised learning|unsupervised]] and [[semi-supervised learning]] algorithms. Such algorithms can learn from data that has not been hand-annotated with the desired answers or using a combination of annotated and non-annotated data. Generally, this task is much more difficult than [[supervised learning]], and typically produces less accurate results for a given amount of input data. However, there is an enormous amount of non-annotated data available (including, among other things, the entire content of the [[World Wide Web]]), which can often make up for the worse efficiency if the algorithm used has a low enough [[time complexity]] to be practical.
*'''2003:''' [[word n-gram language model|word n-gram model]], at the time the best statistical algorithm, is outperformed by a [[multi-layer perceptron]] (with a single hidden layer and context length of several words, trained on up to 14 million words, by [[Yoshua Bengio|Bengio]] et al.)<ref>{{Cite journal|url=https://dl.acm.org/doi/10.5555/944919.944966|title=A neural probabilistic language model|first1=Yoshua|last1=Bengio|first2=Réjean|last2=Ducharme|first3=Pascal|last3=Vincent|first4=Christian|last4=Janvin|date=March 1, 2003|journal=The Journal of Machine Learning Research|volume=3|pages=1137–1155|via=ACM Digital Library}}</ref>
*'''2010:''' [[Tomáš Mikolov]] (then a PhD student at [[Brno University of Technology]]) with co-authors applied a simple [[recurrent neural network]] with a single hidden layer to language modelling,<ref>{{cite book |last1=Mikolov |first1=Tomáš |last2=Karafiát |first2=Martin |last3=Burget |first3=Lukáš |last4=Černocký |first4=Jan |last5=Khudanpur |first5=Sanjeev |title=Interspeech 2010 |chapter=Recurrent neural network based language model |journal=Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010 |date=26 September 2010 |pages=1045–1048 |doi=10.21437/Interspeech.2010-343 |s2cid=17048224 |chapter-url=https://gwern.net/doc/ai/nn/rnn/2010-mikolov.pdf |language=en}}</ref> and in the following years he went on to develop [[Word2vec]]. In the 2010s, [[representation learning]] and [[deep learning|deep neural network]]-style (featuring many hidden layers) machine learning methods became widespread in natural language processing. That popularity was due partly to a flurry of results showing that such techniques<ref name="goldberg:nnlp17">{{cite journal |last=Goldberg |first=Yoav |year=2016 |arxiv=1807.10854 |title=A Primer on Neural Network Models for Natural Language Processing |journal=Journal of Artificial Intelligence Research |volume=57 |pages=345–420 |doi=10.1613/jair.4992 |s2cid=8273530 }}</ref><ref name="goodfellow:book16">{{cite book |first1=Ian |last1=Goodfellow |first2=Yoshua |last2=Bengio |first3=Aaron |last3=Courville |url=http://www.deeplearningbook.org/ |title=Deep Learning |publisher=MIT Press |year=2016 }}</ref> can achieve state-of-the-art results in many natural language tasks, e.g., in [[language modeling]]<ref name="
==Approaches: Symbolic, statistical, neural networks{{anchor|Statistical natural language processing (SNLP)}} ==
|