Revision as of 06:01, 17 April 2025 edit Citation bot (talk \| contribs) Bots 5,868,633 edits Add: pages, issue, volume. \| Use this bot. Report bugs. \| Suggested by Dominic3203 \| Linked from User:LinguisticMystic/cs/outline \| #UCB_webform_linked 1736/2277 ← Previous edit		Revision as of 15:35, 15 May 2025 edit undo TheTechnician27 (talk \| contribs) Autopatrolled, Extended confirmed users, IP block exemptions, New page reviewers, Pending changes reviewers 25,283 edits m 'Transformer's isn't a proper noun because this is about ML, not Optimus Prime Next edit →
Line 9: However, traditional RNNs suffer from the [[vanishing gradient problem]], which limits their ability to learn long-range dependencies. This issue was addressed by the development of the [[long short-term memory]] (LSTM) architecture in 1997, making it the standard RNN variant for handling long-term dependencies. Later, [[Gated recurrent unit\|Gated Recurrent Units]] (GRUs) were introduced as a more computationally efficient alternative. In recent years, [[Transformer (deep learning architecture)\|~~Transformers~~transformers]], which rely on self-attention mechanisms instead of recurrence, have become the dominant architecture for many sequence-processing tasks, particularly in natural language processing, due to their superior handling of long-range dependencies and greater parallelizability. Nevertheless, RNNs remain relevant for applications where computational efficiency, real-time processing, or the inherent sequential nature of data is crucial. ==History== Line 34: Around 2006, bidirectional LSTM started to revolutionize [[speech recognition]], outperforming traditional models in certain speech applications.<ref>{{Cite journal \|last1=Graves \|first1=Alex \|last2=Schmidhuber \|first2=Jürgen \|date=2005-07-01 \|title=Framewise phoneme classification with bidirectional LSTM and other neural network architectures \|journal=Neural Networks \|series=IJCNN 2005 \|volume=18 \|issue=5 \|pages=602–610 \|citeseerx=10.1.1.331.5800 \|doi=10.1016/j.neunet.2005.06.042 \|pmid=16112549 \|s2cid=1856462}}</ref><ref name="fernandez2007keyword">{{Cite conference \|last1=Fernández \|first1=Santiago \|last2=Graves \|first2=Alex \|last3=Schmidhuber \|first3=Jürgen \|year=2007 \|title=An Application of Recurrent Neural Networks to Discriminative Keyword Spotting \|url=http://dl.acm.org/citation.cfm?id=1778066.1778092 \|book-title=Proceedings of the 17th International Conference on Artificial Neural Networks \|series=ICANN'07 \|___location=Berlin, Heidelberg \|publisher=Springer-Verlag \|pages=220–229 \|isbn=978-3-540-74693-5 }}</ref> They also improved large-vocabulary speech recognition<ref name="sak2014" /><ref name="liwu2015" /> and [[text-to-speech]] synthesis<ref name="fan2015">{{cite conference \|last1=Fan \|first1=Bo \|last2=Wang \|first2=Lijuan \|last3=Soong \|first3=Frank K. \|last4=Xie \|first4=Lei \|title=Photo-Real Talking Head with Deep Bidirectional LSTM \|chapter-url= \|editor= \|book-title=Proceedings of ICASSP 2015 IEEE International Conference on Acoustics, Speech and Signal Processing \|doi=10.1109/ICASSP.2015.7178899 \|date=2015 \|isbn=978-1-4673-6997-8 \|pages=4884–8 }}</ref> and was used in [[Google Voice Search\|Google voice search]], and dictation on [[Android (operating system)\|Android devices]].<ref name="sak2015">{{Cite web \|url=http://googleresearch.blogspot.ch/2015/09/google-voice-search-faster-and-more.html \|title=Google voice search: faster and more accurate \|last1=Sak \|first1=Haşim \|last2=Senior \|first2=Andrew \|date=September 2015 \|last3=Rao \|first3=Kanishka \|last4=Beaufays \|first4=Françoise \|last5=Schalkwyk \|first5=Johan}}</ref> They broke records for improved [[machine translation]],<ref name="sutskever2014">{{Cite journal \|last1=Sutskever \|first1=Ilya \|last2=Vinyals \|first2=Oriol \|last3=Le \|first3=Quoc V. \|year=2014 \|title=Sequence to Sequence Learning with Neural Networks \|url=https://papers.nips.cc/paper/5346-sequence-to-sequence-learning-with-neural-networks.pdf \|journal=Electronic Proceedings of the Neural Information Processing Systems Conference \|volume=27 \|page=5346 \|arxiv=1409.3215 \|bibcode=2014arXiv1409.3215S }}</ref> [[Language Modeling\|language modeling]]<ref name="vinyals2016">{{cite arXiv \|last1=Jozefowicz \|first1=Rafal \|last2=Vinyals \|first2=Oriol \|last3=Schuster \|first3=Mike \|last4=Shazeer \|first4=Noam \|last5=Wu \|first5=Yonghui \|date=2016-02-07 \|title=Exploring the Limits of Language Modeling \|eprint=1602.02410 \|class=cs.CL}}</ref> and Multilingual Language Processing.<ref name="gillick2015">{{cite arXiv \|last1=Gillick \|first1=Dan \|last2=Brunk \|first2=Cliff \|last3=Vinyals \|first3=Oriol \|last4=Subramanya \|first4=Amarnag \|date=2015-11-30 \|title=Multilingual Language Processing From Bytes \|eprint=1512.00103 \|class=cs.CL}}</ref> Also, LSTM combined with [[convolutional neural network]]s (CNNs) improved [[automatic image captioning]].<ref name="vinyals2015">{{cite arXiv \|last1=Vinyals \|first1=Oriol \|last2=Toshev \|first2=Alexander \|last3=Bengio \|first3=Samy \|last4=Erhan \|first4=Dumitru \|date=2014-11-17 \|title=Show and Tell: A Neural Image Caption Generator \|eprint=1411.4555 \|class=cs.CV }}</ref> The idea of encoder-decoder sequence transduction had been developed in the early 2010s. The papers most commonly cited as the originators that produced seq2seq are two papers from 2014.<ref name=":2">{{Cite arXiv \|last1=Cho \|first1=Kyunghyun \|last2=van Merrienboer \|first2=Bart \|last3=Gulcehre \|first3=Caglar \|last4=Bahdanau \|first4=Dzmitry \|last5=Bougares \|first5=Fethi \|last6=Schwenk \|first6=Holger \|last7=Bengio \|first7=Yoshua \|date=2014-06-03 \|title=Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation \|class=cs.CL \|eprint =1406.1078}}</ref><ref name="sequence">{{cite arXiv \|eprint=1409.3215 \|class=cs.CL \|first1=Ilya \|last1=Sutskever \|first2=Oriol \|last2=Vinyals \|title=Sequence to sequence learning with neural networks \|date=14 Dec 2014 \|last3=Le \|first3=Quoc Viet}} [first version posted to arXiv on 10 Sep 2014]</ref> A [[seq2seq]] architecture employs two RNN, typically LSTM, an "encoder" and a "decoder", for sequence transduction, such as machine translation. They became state of the art in machine translation, and was instrumental in the development of [[Attention (machine learning)\|attention mechanisms]] and [[Transformer (deep learning architecture)\|~~Transformers~~transformers]]. ==Configurations== Line 79: [[File:Seq2seq_RNN_encoder-decoder_with_attention_mechanism,_training.png\|thumb\|Encoder-decoder RNN with attention mechanism.]] Two RNNs can be run front-to-back in an '''encoder-decoder''' configuration. The encoder RNN processes an input sequence into a sequence of hidden vectors, and the decoder RNN processes the sequence of hidden vectors to an output sequence, with an optional [[Attention (machine learning)\|attention mechanism]]. This was used to construct state of the art [[Neural machine translation\|neural machine translators]] during the 2014–2017 period. This was an instrumental step towards the development of [[Transformer (deep learning architecture)\|~~Transformers~~transformers]].<ref>{{Cite journal \|last1=Vaswani \|first1=Ashish \|last2=Shazeer \|first2=Noam \|last3=Parmar \|first3=Niki \|last4=Uszkoreit \|first4=Jakob \|last5=Jones \|first5=Llion \|last6=Gomez \|first6=Aidan N \|last7=Kaiser \|first7=Ł ukasz \|last8=Polosukhin \|first8=Illia \|date=2017 \|title=Attention is All you Need \|url=https://proceedings.neurips.cc/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html \|journal=Advances in Neural Information Processing Systems \|publisher=Curran Associates, Inc. \|volume=30}}</ref> === PixelRNN ===

Recurrent neural network: Difference between revisions