Revision as of 08:21, 11 August 2024 edit Iohla (talk \| contribs) 207 edits No edit summary Tag: Visual edit ← Previous edit		Revision as of 08:28, 11 August 2024 edit undo Iohla (talk \| contribs) 207 edits changed the spelling and added the correct tenses Tag: Visual edit Next edit →
Line 24: === Neural NLP (present) === In 2003, [[word n-gram language model\|word n-gram model]], at the time the best statistical algorithm, was ~~overperformed~~outperformed by a [[multi-layer perceptron]] (with a single hidden layer and context length of several words trained on up to 14 million of words with a CPU cluster in [[language model]]ling) by [[Yoshua Bengio]] with co-authors.<ref>{{Cite journal\|url=https://dl.acm.org/doi/10.5555/944919.944966\|title=A neural probabilistic language model\|first1=Yoshua\|last1=Bengio\|first2=Réjean\|last2=Ducharme\|first3=Pascal\|last3=Vincent\|first4=Christian\|last4=Janvin\|date=March 1, 2003\|journal=The Journal of Machine Learning Research\|volume=3\|pages=1137–1155\|via=ACM Digital Library}}</ref> In 2010, [[Tomáš Mikolov]] (then a PhD student at [[Brno University of Technology]]) with co-authors applied a simple [[recurrent neural network]] with a single hidden layer to language modelling,<ref>{{cite book \|last1=Mikolov \|first1=Tomáš \|last2=Karafiát \|first2=Martin \|last3=Burget \|first3=Lukáš \|last4=Černocký \|first4=Jan \|last5=Khudanpur \|first5=Sanjeev \|title=Interspeech 2010 \|chapter=Recurrent neural network based language model \|journal=Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010 \|date=26 September 2010 \|pages=1045–1048 \|doi=10.21437/Interspeech.2010-343 \|s2cid=17048224 \|chapter-url=https://gwern.net/doc/ai/nn/rnn/2010-mikolov.pdf \|language=en}}</ref> and in the following years he went on to develop [[Word2vec]]. In the 2010s, [[representation learning]] and [[deep learning\|deep neural network]]-style (featuring many hidden layers) machine learning methods became widespread in natural language processing. That popularity was due partly to a flurry of results showing that such techniques<ref name="goldberg:nnlp17">{{cite journal \|last=Goldberg \|first=Yoav \|year=2016 \|arxiv=1807.10854 \|title=A Primer on Neural Network Models for Natural Language Processing \|journal=Journal of Artificial Intelligence Research \|volume=57 \|pages=345–420 \|doi=10.1613/jair.4992 \|s2cid=8273530 }}</ref><ref name="goodfellow:book16">{{cite book \|first1=Ian \|last1=Goodfellow \|first2=Yoshua \|last2=Bengio \|first3=Aaron \|last3=Courville \|url=http://www.deeplearningbook.org/ \|title=Deep Learning \|publisher=MIT Press \|year=2016 }}</ref> can achieve state-of-the-art results in many natural language tasks, e.g., in [[language modeling]]<ref name="jozefowicz:lm16">{{cite book \|first1=Rafal \|last1=Jozefowicz \|first2=Oriol \|last2=Vinyals \|first3=Mike \|last3=Schuster \|first4=Noam \|last4=Shazeer \|first5=Yonghui \|last5=Wu \|year=2016 \|arxiv=1602.02410 \|title=Exploring the Limits of Language Modeling \|bibcode=2016arXiv160202410J }}</ref> and parsing.<ref name="choe:emnlp16">{{cite journal \|first1=Do Kook \|last1=Choe \|first2=Eugene \|last2=Charniak \|journal=Emnlp 2016 \|url=https://aclanthology.coli.uni-saarland.de/papers/D16-1257/d16-1257 \|title=Parsing as Language Modeling \|access-date=2018-10-22 \|archive-date=2018-10-23 \|archive-url=https://web.archive.org/web/20181023034804/https://aclanthology.coli.uni-saarland.de/papers/D16-1257/d16-1257 \|url-status=dead }}</ref><ref name="vinyals:nips15">{{cite journal \|last1=Vinyals \|first1=Oriol \|last2=Kaiser \|first2=Lukasz \|display-authors=1 \|journal=Nips2015 \|title=Grammar as a Foreign Language \|year=2014 \|arxiv=1412.7449 \|bibcode=2014arXiv1412.7449V \|url=https://papers.nips.cc/paper/5635-grammar-as-a-foreign-language.pdf }}</ref> This is increasingly important [[artificial intelligence in healthcare\|in medicine and healthcare]], where NLP helps analyze notes and text in [[Electronic health record\|electronic health records]] that would otherwise be inaccessible for study when seeking to improve care<ref>{{Cite journal\|last1=Turchin\|first1=Alexander\|last2=Florez Builes\|first2=Luisa F.\|date=2021-03-19\|title=Using Natural Language Processing to Measure and Improve Quality of Diabetes Care: A Systematic Review\|journal=Journal of Diabetes Science and Technology\|volume=15\|issue=3\|language=en\|pages=553–560\|doi=10.1177/19322968211000831\|pmid=33736486\|pmc=8120048\|issn=1932-2968}}</ref> or protect patient privacy.<ref>{{Cite journal \|last1=Lee \|first1=Jennifer \|last2=Yang \|first2=Samuel \|last3=Holland-Hall \|first3=Cynthia \|last4=Sezgin \|first4=Emre \|last5=Gill \|first5=Manjot \|last6=Linwood \|first6=Simon \|last7=Huang \|first7=Yungui \|last8=Hoffman \|first8=Jeffrey \|date=2022-06-10 \|title=Prevalence of Sensitive Terms in Clinical Notes Using Natural Language Processing Techniques: Observational Study \|journal=JMIR Medical Informatics \|language=en \|volume=10 \|issue=6 \|pages=e38482 \|doi=10.2196/38482 \|issn=2291-9694 \|pmc=9233261 \|pmid=35687381 \|doi-access=free }}</ref> Line 54: === Neural networks === {{Further\|Artificial neural network}} A major drawback of statistical methods is that they require elaborate [[feature engineering]]. Since 2015,<ref>{{Cite web \|last=Socher \|first=Richard \|title=Deep Learning For NLP-ACL 2012 Tutorial \|url=https://www.socher.org/index.php/Main/DeepLearningForNLP-ACL2012Tutorial \|access-date=2020-08-17 \|website=www.socher.org}} This was an early Deep Learning tutorial at the ACL 2012 and met with both interest and (at the time) skepticism by most participants. Until then, neural learning was basically rejected because of its lack of statistical interpretability. Until 2015, deep learning had evolved into the major framework of NLP. [Link is broken, try http://web.stanford.edu/class/cs224n/]</ref> the statistical approach ~~was~~has been replaced by the [[Artificial neural network\|neural networks]] approach, using [[Semantic networks\|semantic networks]]<ref>{{cite book \|last1=Segev \|first1=Elad \|title=Semantic Network Analysis in Social Sciences \|date=2022 \|publisher=Routledge \|___location=London \|isbn=9780367636524 \|url=https://www.routledge.com/Semantic-Network-Analysis-in-Social-Sciences/Segev/p/book/9780367636524 \|access-date=5 December 2021 \|archive-date=5 December 2021 \|archive-url=https://web.archive.org/web/20211205140726/https://www.routledge.com/Semantic-Network-Analysis-in-Social-Sciences/Segev/p/book/9780367636524 \|url-status=live }}</ref> and [[word embedding]]s to capture semantic properties of words. Intermediate tasks (e.g., part-of-speech tagging and dependency parsing) ~~have~~are not ~~been~~ needed anymore. [[Neural machine translation]], based on then-newly-invented [[Seq2seq\|sequence-to-sequence]] transformations, made obsolete the intermediate steps, such as word alignment, previously necessary for [[statistical machine translation]].

Natural language processing: Difference between revisions