Content deleted Content added
Citation bot (talk | contribs) Add: pages, issue, volume. | Use this bot. Report bugs. | Suggested by Dominic3203 | Linked from User:LinguisticMystic/cs/outline | #UCB_webform_linked 1736/2277 |
m 'Transformer's isn't a proper noun because this is about ML, not Optimus Prime |
||
Line 9:
However, traditional RNNs suffer from the [[vanishing gradient problem]], which limits their ability to learn long-range dependencies. This issue was addressed by the development of the [[long short-term memory]] (LSTM) architecture in 1997, making it the standard RNN variant for handling long-term dependencies. Later, [[Gated recurrent unit|Gated Recurrent Units]] (GRUs) were introduced as a more computationally efficient alternative.
In recent years, [[Transformer (deep learning architecture)|
==History==
Line 34:
Around 2006, bidirectional LSTM started to revolutionize [[speech recognition]], outperforming traditional models in certain speech applications.<ref>{{Cite journal |last1=Graves |first1=Alex |last2=Schmidhuber |first2=Jürgen |date=2005-07-01 |title=Framewise phoneme classification with bidirectional LSTM and other neural network architectures |journal=Neural Networks |series=IJCNN 2005 |volume=18 |issue=5 |pages=602–610 |citeseerx=10.1.1.331.5800 |doi=10.1016/j.neunet.2005.06.042 |pmid=16112549 |s2cid=1856462}}</ref><ref name="fernandez2007keyword">{{Cite conference |last1=Fernández |first1=Santiago |last2=Graves |first2=Alex |last3=Schmidhuber |first3=Jürgen |year=2007 |title=An Application of Recurrent Neural Networks to Discriminative Keyword Spotting |url=http://dl.acm.org/citation.cfm?id=1778066.1778092 |book-title=Proceedings of the 17th International Conference on Artificial Neural Networks |series=ICANN'07 |___location=Berlin, Heidelberg |publisher=Springer-Verlag |pages=220–229 |isbn=978-3-540-74693-5 }}</ref> They also improved large-vocabulary speech recognition<ref name="sak2014" /><ref name="liwu2015" /> and [[text-to-speech]] synthesis<ref name="fan2015">{{cite conference |last1=Fan |first1=Bo |last2=Wang |first2=Lijuan |last3=Soong |first3=Frank K. |last4=Xie |first4=Lei |title=Photo-Real Talking Head with Deep Bidirectional LSTM |chapter-url= |editor= |book-title=Proceedings of ICASSP 2015 IEEE International Conference on Acoustics, Speech and Signal Processing |doi=10.1109/ICASSP.2015.7178899 |date=2015 |isbn=978-1-4673-6997-8 |pages=4884–8 }}</ref> and was used in [[Google Voice Search|Google voice search]], and dictation on [[Android (operating system)|Android devices]].<ref name="sak2015">{{Cite web |url=http://googleresearch.blogspot.ch/2015/09/google-voice-search-faster-and-more.html |title=Google voice search: faster and more accurate |last1=Sak |first1=Haşim |last2=Senior |first2=Andrew |date=September 2015 |last3=Rao |first3=Kanishka |last4=Beaufays |first4=Françoise |last5=Schalkwyk |first5=Johan}}</ref> They broke records for improved [[machine translation]],<ref name="sutskever2014">{{Cite journal |last1=Sutskever |first1=Ilya |last2=Vinyals |first2=Oriol |last3=Le |first3=Quoc V. |year=2014 |title=Sequence to Sequence Learning with Neural Networks |url=https://papers.nips.cc/paper/5346-sequence-to-sequence-learning-with-neural-networks.pdf |journal=Electronic Proceedings of the Neural Information Processing Systems Conference |volume=27 |page=5346 |arxiv=1409.3215 |bibcode=2014arXiv1409.3215S }}</ref> [[Language Modeling|language modeling]]<ref name="vinyals2016">{{cite arXiv |last1=Jozefowicz |first1=Rafal |last2=Vinyals |first2=Oriol |last3=Schuster |first3=Mike |last4=Shazeer |first4=Noam |last5=Wu |first5=Yonghui |date=2016-02-07 |title=Exploring the Limits of Language Modeling |eprint=1602.02410 |class=cs.CL}}</ref> and Multilingual Language Processing.<ref name="gillick2015">{{cite arXiv |last1=Gillick |first1=Dan |last2=Brunk |first2=Cliff |last3=Vinyals |first3=Oriol |last4=Subramanya |first4=Amarnag |date=2015-11-30 |title=Multilingual Language Processing From Bytes |eprint=1512.00103 |class=cs.CL}}</ref> Also, LSTM combined with [[convolutional neural network]]s (CNNs) improved [[automatic image captioning]].<ref name="vinyals2015">{{cite arXiv |last1=Vinyals |first1=Oriol |last2=Toshev |first2=Alexander |last3=Bengio |first3=Samy |last4=Erhan |first4=Dumitru |date=2014-11-17 |title=Show and Tell: A Neural Image Caption Generator |eprint=1411.4555 |class=cs.CV }}</ref>
The idea of encoder-decoder sequence transduction had been developed in the early 2010s. The papers most commonly cited as the originators that produced seq2seq are two papers from 2014.<ref name=":2">{{Cite arXiv |last1=Cho |first1=Kyunghyun |last2=van Merrienboer |first2=Bart |last3=Gulcehre |first3=Caglar |last4=Bahdanau |first4=Dzmitry |last5=Bougares |first5=Fethi |last6=Schwenk |first6=Holger |last7=Bengio |first7=Yoshua |date=2014-06-03 |title=Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation |class=cs.CL |eprint =1406.1078}}</ref><ref name="sequence">{{cite arXiv |eprint=1409.3215 |class=cs.CL |first1=Ilya |last1=Sutskever |first2=Oriol |last2=Vinyals |title=Sequence to sequence learning with neural networks |date=14 Dec 2014 |last3=Le |first3=Quoc Viet}} [first version posted to arXiv on 10 Sep 2014]</ref> A [[seq2seq]] architecture employs two RNN, typically LSTM, an "encoder" and a "decoder", for sequence transduction, such as machine translation. They became state of the art in machine translation, and was instrumental in the development of [[Attention (machine learning)|attention mechanisms]] and [[Transformer (deep learning architecture)|
==Configurations==
Line 79:
[[File:Seq2seq_RNN_encoder-decoder_with_attention_mechanism,_training.png|thumb|Encoder-decoder RNN with attention mechanism.]]
Two RNNs can be run front-to-back in an '''encoder-decoder''' configuration. The encoder RNN processes an input sequence into a sequence of hidden vectors, and the decoder RNN processes the sequence of hidden vectors to an output sequence, with an optional [[Attention (machine learning)|attention mechanism]]. This was used to construct state of the art [[Neural machine translation|neural machine translators]] during the 2014–2017 period. This was an instrumental step towards the development of [[Transformer (deep learning architecture)|
=== PixelRNN ===
|