Revision as of 11:06, 18 July 2025 edit Hooman Mallahzadeh (talk \| contribs) Extended confirmed users 4,639 edits →Gradient descent ← Previous edit		Revision as of 11:26, 18 July 2025 edit undo Hooman Mallahzadeh (talk \| contribs) Extended confirmed users 4,639 edits →Gradient descent Next edit →
Line 216: A major problem with gradient descent for standard RNN architectures is that [[Vanishing gradient problem\|error gradients vanish]] exponentially quickly with the size of the time lag between important events.<ref name="hochreiter1991" /><ref name="HOCH2001">{{cite book \|last=Hochreiter \|first=Sepp \|title=A Field Guide to Dynamical Recurrent Networks \|date=15 January 2001 \|publisher=John Wiley & Sons \|isbn=978-0-7803-5369-5 \|editor-last1=Kolen \|editor-first1=John F. \|chapter=Gradient flow in recurrent nets: the difficulty of learning long-term dependencies \|display-authors=etal \|editor-last2=Kremer \|editor-first2=Stefan C. \|chapter-url={{google books \|plainurl=y \|id=NWOcMVA64aAC }}}}</ref> LSTM combined with a BPTT/RTRL hybrid learning method attempts to overcome these problems.<ref name="lstm" /> This problem is also solved in the independently recurrent neural network (IndRNN)<ref name="auto" /> by reducing the context of a neuron to its own past state and the cross-neuron information can then be explored in the following layers. Memories of different ranges including long-term memory can be learned without the gradient vanishing and exploding problems. The ~~on-line~~[[online algorithm]] called '''causal recursive backpropagation''' (CRBP), implements and combines BPTT and RTRL paradigms for locally recurrent networks.<ref>{{Cite journal \|last1=Campolucci \|first1=Paolo \|last2=Uncini \|first2=Aurelio \|last3=Piazza \|first3=Francesco \|last4=Rao \|first4=Bhaskar D. \|year=1999 \|title=On-Line Learning Algorithms for Locally Recurrent Neural Networks \|journal=IEEE Transactions on Neural Networks \|volume=10 \|issue=2 \|pages=253–271 \|citeseerx=10.1.1.33.7550 \|doi=10.1109/72.750549 \|pmid=18252525}}</ref> It works with the most general locally recurrent networks. The CRBP algorithm can minimize the global error term. This fact improves the stability of the algorithm, providing a unifying view of gradient calculation techniques for recurrent networks with local feedback. One approach to gradient information computation in RNNs with arbitrary architectures is based on signal-flow graphs diagrammatic derivation.<ref>{{Cite journal \|last1=Wan \|first1=Eric A. \|last2=Beaufays \|first2=Françoise \|year=1996 \|title=Diagrammatic derivation of gradient algorithms for neural networks \|journal=Neural Computation \|volume=8 \|pages=182–201 \|doi=10.1162/neco.1996.8.1.182 \|s2cid=15512077}}</ref> It uses the BPTT batch algorithm, based on Lee's theorem for network sensitivity calculations.<ref name="ReferenceA">{{Cite journal \|last1=Campolucci \|first1=Paolo \|last2=Uncini \|first2=Aurelio \|last3=Piazza \|first3=Francesco \|year=2000 \|title=A Signal-Flow-Graph Approach to On-line Gradient Calculation \|journal=Neural Computation \|volume=12 \|issue=8 \|pages=1901–1927 \|citeseerx=10.1.1.212.5406 \|doi=10.1162/089976600300015196 \|pmid=10953244 \|s2cid=15090951}}</ref> It was proposed by Wan and Beaufays, while its fast online version was proposed by Campolucci, Uncini and Piazza.<ref name="ReferenceA" />

Recurrent neural network: Difference between revisions