=== Before modern ===
One origin of RNN was neuroscience. The word "recurrent" is used to describe loop-like structures in anatomy. In 1901, [[Santiago Ramón y Cajal|Cajal]] observed "recurrent semicircles" in the [[Cerebellum|cerebellar cortex]] formed by [[parallel fiber]], [[Purkinje cell|Purkinje cells]]s, and [[Granulegranule cell|granule cells]]s.<ref>{{Cite journal |last1=Espinosa-Sanchez |first1=Juan Manuel |last2=Gomez-Marin |first2=Alex |last3=de Castro |first3=Fernando |date=2023-07-05 |title=The Importance of Cajal's and Lorente de Nó's Neuroscience to the Birth of Cybernetics |url=http://journals.sagepub.com/doi/10.1177/10738584231179932 |journal=The Neuroscientist |language=en |doi=10.1177/10738584231179932 |pmid=37403768 |hdl=10261/348372 |issn=1073-8584|hdl-access=free }}</ref><ref>{{Cite book |last=Ramón y Cajal |first=Santiago |url=https://archive.org/details/b2129592x_0002/page/n159/mode/2up |title=Histologie du système nerveux de l'homme & des vertébrés |date=1909 |publisher=Paris : A. Maloine |others=Foyle Special Collections Library King's College London |volume=II |pages=149}}</ref> In 1933, [[Rafael Lorente de Nó|Lorente de Nó]] discovered "recurrent, reciprocal connections" by [[Golgi's method]], and proposed that excitatory loops explain certain aspects of the [[vestibulo-ocular reflex]].<ref>{{Cite journal |last=de NÓ |first=R. Lorente |date=1933-08-01 |title=Vestibulo-Ocular Reflex Arc |url=http://archneurpsyc.jamanetwork.com/article.aspx?doi=10.1001/archneurpsyc.1933.02240140009001 |journal=Archives of Neurology and Psychiatry |volume=30 |issue=2 |pages=245 |doi=10.1001/archneurpsyc.1933.02240140009001 |issn=0096-6754}}</ref><ref>{{Cite journal |last=Larriva-Sahd |first=Jorge A. |date=2014-12-03 |title=Some predictions of Rafael Lorente de Nó 80 years later |journal=Frontiers in Neuroanatomy |volume=8 |pages=147 |doi=10.3389/fnana.2014.00147 |doi-access=free |issn=1662-5129 |pmc=4253658 |pmid=25520630}}</ref> During 1940s, multiple people proposed the existence of feedback in the brain, which was a contrast to the previous understanding of the neural system as a purely feedforward structure. [[Donald O. Hebb|Hebb]] considered "reverberating circuit" as an explanation for short-term memory.<ref>{{Cite web |title=reverberating circuit |url=https://www.oxfordreference.com/display/10.1093/oi/authority.20110803100417461 |access-date=2024-07-27 |website=Oxford Reference }}</ref> The McCulloch and Pitts paper (1943), which proposed the [[McCulloch-Pitts neuron]] model, considered networks that contains cycles. The current activity of such networks can be affected by activity indefinitely far in the past.<ref>{{Cite journal |last1=McCulloch |first1=Warren S. |last2=Pitts |first2=Walter |date=December 1943 |title=A logical calculus of the ideas immanent in nervous activity |url=http://link.springer.com/10.1007/BF02478259 |journal=The Bulletin of Mathematical Biophysics |volume=5 |issue=4 |pages=115–133 |doi=10.1007/BF02478259 |issn=0007-4985}}</ref> They were both interested in closed loops as possible explanations for e.g. [[epilepsy]] and [[Complex regional pain syndrome|causalgia]].<ref>{{Cite journal |last1=Moreno-Díaz |first1=Roberto |last2=Moreno-Díaz |first2=Arminda |date=April 2007 |title=On the legacy of W.S. McCulloch |url=https://linkinghub.elsevier.com/retrieve/pii/S0303264706002152 |journal=Biosystems |volume=88 |issue=3 |pages=185–190 |doi=10.1016/j.biosystems.2006.08.010|pmid=17184902 |bibcode=2007BiSys..88..185M }}</ref><ref>{{Cite journal |last=Arbib |first=Michael A |date=December 2000 |title=Warren McCulloch's Search for the Logic of the Nervous System |url=https://muse.jhu.edu/article/46496 |journal=Perspectives in Biology and Medicine |volume=43 |issue=2 |pages=193–216 |doi=10.1353/pbm.2000.0001 |pmid=10804585 |issn=1529-8795}}</ref> [[Renshaw cell|Recurrent inhibition]] was proposed in 1946 as a negative feedback mechanism in motor control. Neural feedback loops were a common topic of discussion at the [[Macy conferences]].<ref>{{Cite journal |last=Renshaw |first=Birdsey |date=1946-05-01 |title=Central Effects of Centripetal Impulses in Axons of Spinal Ventral Roots |url=https://www.physiology.org/doi/10.1152/jn.1946.9.3.191 |journal=Journal of Neurophysiology |volume=9 |issue=3 |pages=191–204 |doi=10.1152/jn.1946.9.3.191 |pmid=21028162 |issn=0022-3077}}</ref> See <ref name=":0">{{Cite journal |last=Grossberg |first=Stephen |date=2013-02-22 |title=Recurrent Neural Networks |journal=Scholarpedia |volume=8 |issue=2 |pages=1888 |doi=10.4249/scholarpedia.1888 |doi-access=free |bibcode=2013SchpJ...8.1888G |issn=1941-6016}}</ref> for an extensive review of recurrent neural network models in neuroscience.[[File:Typical_connections_in_a_close-loop_cross-coupled_perceptron.png|thumb|A close-loop cross-coupled perceptron network.<ref name=":1" />{{Pg|page=403|___location=Fig. 47}}.]]
[[Frank Rosenblatt]] in 1960 published "close-loop cross-coupled perceptrons", which are 3-layered [[perceptron]] networks whose middle layer contains recurrent connections that change by a [[Hebbian theory|Hebbian learning]] rule.<ref>F. Rosenblatt, "[[iarchive:SelfOrganizingSystems/page/n87/mode/1up|Perceptual Generalization over Transformation Groups]]", pp. 63--100 in ''Self-organizing Systems: Proceedings of an Inter-disciplinary Conference, 5 and 6 May, 1959''. Edited by Marshall C. Yovitz and Scott Cameron. London, New York, [etc.], Pergamon Press, 1960. ix, 322 p.</ref>{{Pg|pages=73-75}} Later, in ''Principles of Neurodynamics'' (1961), he described "closed-loop cross-coupled" and "back-coupled" perceptron networks, and made theoretical and experimental studies for Hebbian learning in these networks,<ref name=":1">{{Cite book |last=Rosenblatt |first=Frank |url=https://archive.org/details/DTIC_AD0256582/page/n3/mode/2up |title=DTIC AD0256582: PRINCIPLES OF NEURODYNAMICS. PERCEPTRONS AND THE THEORY OF BRAIN MECHANISMS |date=1961-03-15 |publisher=Defense Technical Information Center |language=english}}</ref>{{Pg|___location=Chapter 19, 21}} and noted that a fully cross-coupled perceptron network is equivalent to an infinitely deep feedforward network.<ref name=":1" />{{Pg|___location=Section 19.11}}
Similar networks were published by Kaoru Nakano in 1971,<ref name="Nakano1971">{{cite book |last1=Nakano |first1=Kaoru |title=Pattern Recognition and Machine Learning |date=1971 |isbn=978-1-4615-7568-9 |pages=172–186 |chapter=Learning Process in a Model of Associative Memory |doi=10.1007/978-1-4615-7566-5_15}}</ref><ref name="Nakano1972">{{cite journal |last1=Nakano |first1=Kaoru |date=1972 |title=Associatron-A Model of Associative Memory |journal=IEEE Transactions on Systems, Man, and Cybernetics |volume=SMC-2 |issue=3 |pages=380–388 |doi=10.1109/TSMC.1972.4309133}}</ref>,[[Shun'ichi Amari]] in 1972,<ref name="Amari1972">{{cite journal |last1=Amari |first1=Shun-Ichi |date=1972 |title=Learning patterns and pattern sequences by self-organizing nets of threshold elements |journal=IEEE Transactions |volume=C |issue=21 |pages=1197–1206}}</ref> and {{ill|William A. Little (physicist)|lt=William A. Little|de|William A. Little}} in 1974,<ref name="little74">{{cite journal |last=Little |first=W. A. |year=1974 |title=The Existence of Persistent States in the Brain |journal=Mathematical Biosciences |volume=19 |issue=1–2 |pages=101–120 |doi=10.1016/0025-5564(74)90031-5}}</ref> who was acknowledged by Hopfield in his 1982 paper.
Another origin of RNN was [[statistical mechanics]]. The [[Ising model]] was developed by [[Wilhelm Lenz]]<ref name="lenz1920">{{Citation |last=Lenz |first=W. |title=Beiträge zum Verständnis der magnetischen Eigenschaften in festen Körpern |journal=Physikalische Zeitschrift |volume=21 |pages=613–615 |year=1920 |postscript=. |author-link=Wilhelm Lenz}}</ref> and [[Ernst Ising]]<ref name="ising1925">{{citation |last=Ising |first=E. |title=Beitrag zur Theorie des Ferromagnetismus |journal=Z. Phys. |volume=31 |issue=1 |pages=253–258 |year=1925 |bibcode=1925ZPhy...31..253I |doi=10.1007/BF02980577 |s2cid=122157319}}</ref> in the 1920s<ref>{{cite journal |last1=Brush |first1=Stephen G. |year=1967 |title=History of the Lenz-Ising Model |journal=Reviews of Modern Physics |volume=39 |issue=4 |pages=883–893 |bibcode=1967RvMP...39..883B |doi=10.1103/RevModPhys.39.883}}</ref> as a simple statistical mechanical model of magnets at equilibrium. [[Roy J. Glauber|Glauber]] in 1963 studied the Ising model evolving in time, as a process towards equilibrium ([[Glauber dynamics]]), adding in the component of time.<ref name=":22">{{cite journal |last1=Glauber |first1=Roy J. |date=February 1963 |title=Roy J. Glauber "Time-Dependent Statistics of the Ising Model" |url=https://aip.scitation.org/doi/abs/10.1063/1.1703954 |journal=Journal of Mathematical Physics |volume=4 |issue=2 |pages=294–307 |doi=10.1063/1.1703954 |access-date=2021-03-21}}</ref>
The [[Spin glass|Sherrington–Kirkpatrick model]] of spin glass, published in 1975,<ref>{{Cite journal |last1=Sherrington |first1=David |last2=Kirkpatrick |first2=Scott |date=1975-12-29 |title=Solvable Model of a Spin-Glass |url=https://link.aps.org/doi/10.1103/PhysRevLett.35.1792 |journal=Physical Review Letters |volume=35 |issue=26 |pages=1792–1796 |doi=10.1103/PhysRevLett.35.1792 |bibcode=1975PhRvL..35.1792S |issn=0031-9007}}</ref> is the Hopfield network with random initialization. Sherrington and Kirkpatrick found that it is highly likely for the energy function of the SK model to have many local minima. In the 1982 paper, Hopfield applied this recently developed theory to study the Hopfield network with binary activation functions.<ref name="Hopfield19822">{{cite journal |last1=Hopfield |first1=J. J. |date=1982 |title=Neural networks and physical systems with emergent collective computational abilities |journal=Proceedings of the National Academy of Sciences |volume=79 |issue=8 |pages=2554–2558 |bibcode=1982PNAS...79.2554H |doi=10.1073/pnas.79.8.2554 |pmc=346238 |pmid=6953413 |doi-access=free}}</ref> In a 1984 paper he extended this to continuous activation functions.<ref name=":02">{{cite journal |last1=Hopfield |first1=J. J. |date=1984 |title=Neurons with graded response have collective computational properties like those of two-state neurons |journal=Proceedings of the National Academy of Sciences |volume=81 |issue=10 |pages=3088–3092 |bibcode=1984PNAS...81.3088H |doi=10.1073/pnas.81.10.3088 |pmc=345226 |pmid=6587342 |doi-access=free}}</ref> It became a standard model for the study of neural networks through statistical mechanics.<ref>{{Cite book |last1=Engel |first1=A. |title=Statistical mechanics of learning |last2=Broeck |first2=C. van den |date=2001 |publisher=Cambridge University Press |isbn=978-0-521-77307-2 |___location=Cambridge, UK ; New York, NY}}</ref><ref>{{Cite journal |last1=Seung |first1=H. S. |last2=Sompolinsky |first2=H. |last3=Tishby |first3=N. |date=1992-04-01 |title=Statistical mechanics of learning from examples |url=https://journals.aps.org/pra/abstract/10.1103/PhysRevA.45.6056 |journal=Physical Review A |volume=45 |issue=8 |pages=6056–6091 |doi=10.1103/PhysRevA.45.6056|pmid=9907706 |bibcode=1992PhRvA..45.6056S }}</ref>
===Modern===
[[File:Seq2seq_RNN_encoder-decoder_with_attention_mechanism,_training_and_inferring.png|thumb|Encoder-decoder RNN without attention mechanism.]]
[[File:Seq2seq_RNN_encoder-decoder_with_attention_mechanism,_training.png|thumb|Encoder-decoder RNN with attention mechanism.]]
Two RNNs can be run front-to-back in an '''encoder-decoder''' configuration. The encoder RNN processes an input sequence into a sequence of hidden vectors, and the decoder RNN processes the sequence of hidden vectors to an output sequence, with an optional [[Attention (machine learning)|attention mechanism]]. This was used to construct state of the art [[Neural machine translation|neural machine translators]] during the 2014–2017 period. This was an instrumental step towards the development of [[Transformer (deep learning architecture)|Transformers]].<ref>{{Cite journal |last1=Vaswani |first1=Ashish |last2=Shazeer |first2=Noam |last3=Parmar |first3=Niki |last4=Uszkoreit |first4=Jakob |last5=Jones |first5=Llion |last6=Gomez |first6=Aidan N |last7=Kaiser |first7=Ł ukasz |last8=Polosukhin |first8=Illia |date=2017 |title=Attention is All you Need |url=https://proceedings.neurips.cc/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html |journal=Advances in Neural Information Processing Systems |publisher=Curran Associates, Inc. |volume=30}}</ref>
|