Recurrent neural network: Difference between revisions

Content deleted Content added
Line 216:
A major problem with gradient descent for standard RNN architectures is that [[Vanishing gradient problem|error gradients vanish]] exponentially quickly with the size of the time lag between important events.<ref name="hochreiter1991" /><ref name="HOCH2001">{{cite book |last=Hochreiter |first=Sepp |title=A Field Guide to Dynamical Recurrent Networks |date=15 January 2001 |publisher=John Wiley & Sons |isbn=978-0-7803-5369-5 |editor-last1=Kolen |editor-first1=John F. |chapter=Gradient flow in recurrent nets: the difficulty of learning long-term dependencies |display-authors=etal |editor-last2=Kremer |editor-first2=Stefan C. |chapter-url={{google books |plainurl=y |id=NWOcMVA64aAC }}}}</ref> LSTM combined with a BPTT/RTRL hybrid learning method attempts to overcome these problems.<ref name="lstm" /> This problem is also solved in the independently recurrent neural network (IndRNN)<ref name="auto" /> by reducing the context of a neuron to its own past state and the cross-neuron information can then be explored in the following layers. Memories of different ranges including long-term memory can be learned without the gradient vanishing and exploding problems.
 
The on-line[[online algorithm]] called '''causal recursive backpropagation''' (CRBP), implements and combines BPTT and RTRL paradigms for locally recurrent networks.<ref>{{Cite journal |last1=Campolucci |first1=Paolo |last2=Uncini |first2=Aurelio |last3=Piazza |first3=Francesco |last4=Rao |first4=Bhaskar D. |year=1999 |title=On-Line Learning Algorithms for Locally Recurrent Neural Networks |journal=IEEE Transactions on Neural Networks |volume=10 |issue=2 |pages=253–271 |citeseerx=10.1.1.33.7550 |doi=10.1109/72.750549 |pmid=18252525}}</ref> It works with the most general locally recurrent networks. The CRBP algorithm can minimize the global error term. This fact improves the stability of the algorithm, providing a unifying view of gradient calculation techniques for recurrent networks with local feedback.
 
One approach to gradient information computation in RNNs with arbitrary architectures is based on signal-flow graphs diagrammatic derivation.<ref>{{Cite journal |last1=Wan |first1=Eric A. |last2=Beaufays |first2=Françoise |year=1996 |title=Diagrammatic derivation of gradient algorithms for neural networks |journal=Neural Computation |volume=8 |pages=182–201 |doi=10.1162/neco.1996.8.1.182 |s2cid=15512077}}</ref> It uses the BPTT batch algorithm, based on Lee's theorem for network sensitivity calculations.<ref name="ReferenceA">{{Cite journal |last1=Campolucci |first1=Paolo |last2=Uncini |first2=Aurelio |last3=Piazza |first3=Francesco |year=2000 |title=A Signal-Flow-Graph Approach to On-line Gradient Calculation |journal=Neural Computation |volume=12 |issue=8 |pages=1901–1927 |citeseerx=10.1.1.212.5406 |doi=10.1162/089976600300015196 |pmid=10953244 |s2cid=15090951}}</ref> It was proposed by Wan and Beaufays, while its fast online version was proposed by Campolucci, Uncini and Piazza.<ref name="ReferenceA" />