Content deleted Content added
en dash to hyphen for affix; link to redirect |
→Modern: Plurals to link, grammar correction. |
||
Line 36:
Around 2006, bidirectional LSTM started to revolutionize [[speech recognition]], outperforming traditional models in certain speech applications.<ref>{{Cite journal |last1=Graves |first1=Alex |last2=Schmidhuber |first2=Jürgen |date=2005-07-01 |title=Framewise phoneme classification with bidirectional LSTM and other neural network architectures |journal=Neural Networks |series=IJCNN 2005 |volume=18 |issue=5 |pages=602–610 |citeseerx=10.1.1.331.5800 |doi=10.1016/j.neunet.2005.06.042 |pmid=16112549 |s2cid=1856462}}</ref><ref name="fernandez2007keyword">{{Cite conference |last1=Fernández |first1=Santiago |last2=Graves |first2=Alex |last3=Schmidhuber |first3=Jürgen |year=2007 |title=An Application of Recurrent Neural Networks to Discriminative Keyword Spotting |url=http://dl.acm.org/citation.cfm?id=1778066.1778092 |book-title=Proceedings of the 17th International Conference on Artificial Neural Networks |series=ICANN'07 |___location=Berlin, Heidelberg |publisher=Springer-Verlag |pages=220–229 |isbn=978-3-540-74693-5 }}</ref> They also improved large-vocabulary speech recognition<ref name="sak2014" /><ref name="liwu2015" /> and [[text-to-speech]] synthesis<ref name="fan2015">{{cite conference |last1=Fan |first1=Bo |last2=Wang |first2=Lijuan |last3=Soong |first3=Frank K. |last4=Xie |first4=Lei |title=Photo-Real Talking Head with Deep Bidirectional LSTM |chapter-url= |editor= |book-title=Proceedings of ICASSP 2015 IEEE International Conference on Acoustics, Speech and Signal Processing |doi=10.1109/ICASSP.2015.7178899 |date=2015 |isbn=978-1-4673-6997-8 |pages=4884–8 }}</ref> and was used in [[Google Voice Search|Google voice search]], and dictation on [[Android (operating system)|Android devices]].<ref name="sak2015">{{Cite web |url=http://googleresearch.blogspot.ch/2015/09/google-voice-search-faster-and-more.html |title=Google voice search: faster and more accurate |last1=Sak |first1=Haşim |last2=Senior |first2=Andrew |date=September 2015 |last3=Rao |first3=Kanishka |last4=Beaufays |first4=Françoise |last5=Schalkwyk |first5=Johan}}</ref> They broke records for improved [[machine translation]],<ref name="sutskever2014">{{Cite journal |last1=Sutskever |first1=Ilya |last2=Vinyals |first2=Oriol |last3=Le |first3=Quoc V. |year=2014 |title=Sequence to Sequence Learning with Neural Networks |url=https://papers.nips.cc/paper/5346-sequence-to-sequence-learning-with-neural-networks.pdf |journal=Electronic Proceedings of the Neural Information Processing Systems Conference |volume=27 |page=5346 |arxiv=1409.3215 |bibcode=2014arXiv1409.3215S }}</ref> [[Language Modeling|language modeling]]<ref name="vinyals2016">{{cite arXiv |last1=Jozefowicz |first1=Rafal |last2=Vinyals |first2=Oriol |last3=Schuster |first3=Mike |last4=Shazeer |first4=Noam |last5=Wu |first5=Yonghui |date=2016-02-07 |title=Exploring the Limits of Language Modeling |eprint=1602.02410 |class=cs.CL}}</ref> and Multilingual Language Processing.<ref name="gillick2015">{{cite arXiv |last1=Gillick |first1=Dan |last2=Brunk |first2=Cliff |last3=Vinyals |first3=Oriol |last4=Subramanya |first4=Amarnag |date=2015-11-30 |title=Multilingual Language Processing From Bytes |eprint=1512.00103 |class=cs.CL}}</ref> Also, LSTM combined with [[convolutional neural network]]s (CNNs) improved [[automatic image captioning]].<ref name="vinyals2015">{{cite arXiv |last1=Vinyals |first1=Oriol |last2=Toshev |first2=Alexander |last3=Bengio |first3=Samy |last4=Erhan |first4=Dumitru |date=2014-11-17 |title=Show and Tell: A Neural Image Caption Generator |eprint=1411.4555 |class=cs.CV }}</ref>
The idea of encoder-decoder sequence transduction had been developed in the early 2010s. The papers most commonly cited as the originators that produced seq2seq are two papers from 2014.<ref name=":2">{{Cite arXiv |last1=Cho |first1=Kyunghyun |last2=van Merrienboer |first2=Bart |last3=Gulcehre |first3=Caglar |last4=Bahdanau |first4=Dzmitry |last5=Bougares |first5=Fethi |last6=Schwenk |first6=Holger |last7=Bengio |first7=Yoshua |date=2014-06-03 |title=Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation |class=cs.CL |eprint =1406.1078}}</ref><ref name="sequence">{{cite arXiv |eprint=1409.3215 |class=cs.CL |first1=Ilya |last1=Sutskever |first2=Oriol |last2=Vinyals |title=Sequence to sequence learning with neural networks |date=14 Dec 2014 |last3=Le |first3=Quoc Viet}} [first version posted to arXiv on 10 Sep 2014]</ref> A [[seq2seq]] architecture employs two RNN, typically LSTM, an "encoder" and a "decoder", for sequence transduction, such as machine translation. They became state of the art in machine translation, and was instrumental in the development of [[Attention (machine learning)|attention
==Configurations==
|