Natural language processing: Difference between revisions

Content deleted Content added
m Removed the 404 hyperlink from Ross Quillian, as it was ruining the user/reader experiance.
m Removed the 404 hyperlink from Context length, as it was ruining the user/reader experience.
Line 23:
*'''1990s''': Many of the notable early successes in statistical methods in NLP occurred in the field of [[machine translation]], due especially to work at IBM Research, such as [[IBM alignment models]]. These systems were able to take advantage of existing multilingual [[text corpus|textual corpora]] that had been produced by the [[Parliament of Canada]] and the [[European Union]] as a result of laws calling for the translation of all governmental proceedings into all official languages of the corresponding systems of government. However, most other systems depended on corpora specifically developed for the tasks implemented by these systems, which was (and often continues to be) a major limitation in the success of these systems. As a result, a great deal of research has gone into methods of more effectively learning from limited amounts of data.
*'''2000s''': With the growth of the web, increasing amounts of raw (unannotated) language data have become available since the mid-1990s. Research has thus increasingly focused on [[unsupervised learning|unsupervised]] and [[semi-supervised learning]] algorithms. Such algorithms can learn from data that has not been hand-annotated with the desired answers or using a combination of annotated and non-annotated data. Generally, this task is much more difficult than [[supervised learning]], and typically produces less accurate results for a given amount of input data. However, there is an enormous amount of non-annotated data available (including, among other things, the entire content of the [[World Wide Web]]), which can often make up for the worse efficiency if the algorithm used has a low enough [[time complexity]] to be practical.
*'''2003:''' [[word n-gram language model|word n-gram model]], at the time the best statistical algorithm, is outperformed by a [[multi-layer perceptron]] (with a single hidden layer and [[context length]] of several words, trained on up to 14 million words, by [[Yoshua Bengio|Bengio]] et al.)<ref>{{Cite journal|url=https://dl.acm.org/doi/10.5555/944919.944966|title=A neural probabilistic language model|first1=Yoshua|last1=Bengio|first2=Réjean|last2=Ducharme|first3=Pascal|last3=Vincent|first4=Christian|last4=Janvin|date=March 1, 2003|journal=The Journal of Machine Learning Research|volume=3|pages=1137–1155|via=ACM Digital Library}}</ref>
*'''2010:''' [[Tomáš Mikolov]] (then a PhD student at [[Brno University of Technology]]) with co-authors applied a simple [[recurrent neural network]] with a single hidden layer to language modelling,<ref>{{cite book |last1=Mikolov |first1=Tomáš |last2=Karafiát |first2=Martin |last3=Burget |first3=Lukáš |last4=Černocký |first4=Jan |last5=Khudanpur |first5=Sanjeev |title=Interspeech 2010 |chapter=Recurrent neural network based language model |journal=Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010 |date=26 September 2010 |pages=1045–1048 |doi=10.21437/Interspeech.2010-343 |s2cid=17048224 |chapter-url=https://gwern.net/doc/ai/nn/rnn/2010-mikolov.pdf |language=en}}</ref> and in the following years he went on to develop [[Word2vec]]. In the 2010s, [[representation learning]] and [[deep learning|deep neural network]]-style (featuring many hidden layers) machine learning methods became widespread in natural language processing. That popularity was due partly to a flurry of results showing that such techniques<ref name="goldberg:nnlp17">{{cite journal |last=Goldberg |first=Yoav |year=2016 |arxiv=1807.10854 |title=A Primer on Neural Network Models for Natural Language Processing |journal=Journal of Artificial Intelligence Research |volume=57 |pages=345–420 |doi=10.1613/jair.4992 |s2cid=8273530 }}</ref><ref name="goodfellow:book16">{{cite book |first1=Ian |last1=Goodfellow |first2=Yoshua |last2=Bengio |first3=Aaron |last3=Courville |url=http://www.deeplearningbook.org/ |title=Deep Learning |publisher=MIT Press |year=2016 }}</ref> can achieve state-of-the-art results in many natural language tasks, e.g., in [[language modeling]]<ref name="jozefowicz:lm16">{{cite book |first1=Rafal |last1=Jozefowicz |first2=Oriol |last2=Vinyals |first3=Mike |last3=Schuster |first4=Noam |last4=Shazeer |first5=Yonghui |last5=Wu |year=2016 |arxiv=1602.02410 |title=Exploring the Limits of Language Modeling |bibcode=2016arXiv160202410J }}</ref> and parsing.<ref name="choe:emnlp16">{{cite journal |first1=Do Kook |last1=Choe |first2=Eugene |last2=Charniak |journal=Emnlp 2016 |url=https://aclanthology.coli.uni-saarland.de/papers/D16-1257/d16-1257 |title=Parsing as Language Modeling |access-date=2018-10-22 |archive-date=2018-10-23 |archive-url=https://web.archive.org/web/20181023034804/https://aclanthology.coli.uni-saarland.de/papers/D16-1257/d16-1257 |url-status=dead }}</ref><ref name="vinyals:nips15">{{cite journal |last1=Vinyals |first1=Oriol |last2=Kaiser |first2=Lukasz |display-authors=1 |journal=Nips2015 |title=Grammar as a Foreign Language |year=2014 |arxiv=1412.7449 |bibcode=2014arXiv1412.7449V |url=https://papers.nips.cc/paper/5635-grammar-as-a-foreign-language.pdf }}</ref> This is increasingly important [[artificial intelligence in healthcare|in medicine and healthcare]], where NLP helps analyze notes and text in [[Electronic health record|electronic health records]] that would otherwise be inaccessible for study when seeking to improve care<ref>{{Cite journal|last1=Turchin|first1=Alexander|last2=Florez Builes|first2=Luisa F.|date=2021-03-19|title=Using Natural Language Processing to Measure and Improve Quality of Diabetes Care: A Systematic Review|journal=Journal of Diabetes Science and Technology|volume=15|issue=3|language=en|pages=553–560|doi=10.1177/19322968211000831|pmid=33736486|pmc=8120048|issn=1932-2968}}</ref> or protect patient privacy.<ref>{{Cite journal |last1=Lee |first1=Jennifer |last2=Yang |first2=Samuel |last3=Holland-Hall |first3=Cynthia |last4=Sezgin |first4=Emre |last5=Gill |first5=Manjot |last6=Linwood |first6=Simon |last7=Huang |first7=Yungui |last8=Hoffman |first8=Jeffrey |date=2022-06-10 |title=Prevalence of Sensitive Terms in Clinical Notes Using Natural Language Processing Techniques: Observational Study |journal=JMIR Medical Informatics |language=en |volume=10 |issue=6 |pages=e38482 |doi=10.2196/38482 |issn=2291-9694 |pmc=9233261 |pmid=35687381 |doi-access=free }}</ref>