Revision as of 05:19, 7 January 2025 edit Stephan Leeds (talk \| contribs) Extended confirmed users, IP block exemptions 35,964 edits en dash to hyphen for affix; link to redirect ← Previous edit		Revision as of 17:47, 30 January 2025 edit undo 146.229.122.59 (talk) →Modern: Plurals to link, grammar correction. Tag: Visual edit Next edit →
Line 36: Around 2006, bidirectional LSTM started to revolutionize [[speech recognition]], outperforming traditional models in certain speech applications.<ref>{{Cite journal \|last1=Graves \|first1=Alex \|last2=Schmidhuber \|first2=Jürgen \|date=2005-07-01 \|title=Framewise phoneme classification with bidirectional LSTM and other neural network architectures \|journal=Neural Networks \|series=IJCNN 2005 \|volume=18 \|issue=5 \|pages=602–610 \|citeseerx=10.1.1.331.5800 \|doi=10.1016/j.neunet.2005.06.042 \|pmid=16112549 \|s2cid=1856462}}</ref><ref name="fernandez2007keyword">{{Cite conference \|last1=Fernández \|first1=Santiago \|last2=Graves \|first2=Alex \|last3=Schmidhuber \|first3=Jürgen \|year=2007 \|title=An Application of Recurrent Neural Networks to Discriminative Keyword Spotting \|url=http://dl.acm.org/citation.cfm?id=1778066.1778092 \|book-title=Proceedings of the 17th International Conference on Artificial Neural Networks \|series=ICANN'07 \|___location=Berlin, Heidelberg \|publisher=Springer-Verlag \|pages=220–229 \|isbn=978-3-540-74693-5 }}</ref> They also improved large-vocabulary speech recognition<ref name="sak2014" /><ref name="liwu2015" /> and [[text-to-speech]] synthesis<ref name="fan2015">{{cite conference \|last1=Fan \|first1=Bo \|last2=Wang \|first2=Lijuan \|last3=Soong \|first3=Frank K. \|last4=Xie \|first4=Lei \|title=Photo-Real Talking Head with Deep Bidirectional LSTM \|chapter-url= \|editor= \|book-title=Proceedings of ICASSP 2015 IEEE International Conference on Acoustics, Speech and Signal Processing \|doi=10.1109/ICASSP.2015.7178899 \|date=2015 \|isbn=978-1-4673-6997-8 \|pages=4884–8 }}</ref> and was used in [[Google Voice Search\|Google voice search]], and dictation on [[Android (operating system)\|Android devices]].<ref name="sak2015">{{Cite web \|url=http://googleresearch.blogspot.ch/2015/09/google-voice-search-faster-and-more.html \|title=Google voice search: faster and more accurate \|last1=Sak \|first1=Haşim \|last2=Senior \|first2=Andrew \|date=September 2015 \|last3=Rao \|first3=Kanishka \|last4=Beaufays \|first4=Françoise \|last5=Schalkwyk \|first5=Johan}}</ref> They broke records for improved [[machine translation]],<ref name="sutskever2014">{{Cite journal \|last1=Sutskever \|first1=Ilya \|last2=Vinyals \|first2=Oriol \|last3=Le \|first3=Quoc V. \|year=2014 \|title=Sequence to Sequence Learning with Neural Networks \|url=https://papers.nips.cc/paper/5346-sequence-to-sequence-learning-with-neural-networks.pdf \|journal=Electronic Proceedings of the Neural Information Processing Systems Conference \|volume=27 \|page=5346 \|arxiv=1409.3215 \|bibcode=2014arXiv1409.3215S }}</ref> [[Language Modeling\|language modeling]]<ref name="vinyals2016">{{cite arXiv \|last1=Jozefowicz \|first1=Rafal \|last2=Vinyals \|first2=Oriol \|last3=Schuster \|first3=Mike \|last4=Shazeer \|first4=Noam \|last5=Wu \|first5=Yonghui \|date=2016-02-07 \|title=Exploring the Limits of Language Modeling \|eprint=1602.02410 \|class=cs.CL}}</ref> and Multilingual Language Processing.<ref name="gillick2015">{{cite arXiv \|last1=Gillick \|first1=Dan \|last2=Brunk \|first2=Cliff \|last3=Vinyals \|first3=Oriol \|last4=Subramanya \|first4=Amarnag \|date=2015-11-30 \|title=Multilingual Language Processing From Bytes \|eprint=1512.00103 \|class=cs.CL}}</ref> Also, LSTM combined with [[convolutional neural network]]s (CNNs) improved [[automatic image captioning]].<ref name="vinyals2015">{{cite arXiv \|last1=Vinyals \|first1=Oriol \|last2=Toshev \|first2=Alexander \|last3=Bengio \|first3=Samy \|last4=Erhan \|first4=Dumitru \|date=2014-11-17 \|title=Show and Tell: A Neural Image Caption Generator \|eprint=1411.4555 \|class=cs.CV }}</ref> The idea of encoder-decoder sequence transduction had been developed in the early 2010s. The papers most commonly cited as the originators that produced seq2seq are two papers from 2014.<ref name=":2">{{Cite arXiv \|last1=Cho \|first1=Kyunghyun \|last2=van Merrienboer \|first2=Bart \|last3=Gulcehre \|first3=Caglar \|last4=Bahdanau \|first4=Dzmitry \|last5=Bougares \|first5=Fethi \|last6=Schwenk \|first6=Holger \|last7=Bengio \|first7=Yoshua \|date=2014-06-03 \|title=Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation \|class=cs.CL \|eprint =1406.1078}}</ref><ref name="sequence">{{cite arXiv \|eprint=1409.3215 \|class=cs.CL \|first1=Ilya \|last1=Sutskever \|first2=Oriol \|last2=Vinyals \|title=Sequence to sequence learning with neural networks \|date=14 Dec 2014 \|last3=Le \|first3=Quoc Viet}} [first version posted to arXiv on 10 Sep 2014]</ref> A [[seq2seq]] architecture employs two RNN, typically LSTM, an "encoder" and a "decoder", for sequence transduction, such as machine translation. They became state of the art in machine translation, and was instrumental in the development of [[Attention (machine learning)\|attention ~~mechanism~~mechanisms]] and [[Transformer (deep learning architecture)\|~~Transformer~~Transformers]]. ==Configurations==

Recurrent neural network: Difference between revisions