Bidirectional recurrent neural networks: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 02:20, 2 June 2018 edit CitationCleanerBot (talk \| contribs) Bots 66,146 edits m →Applications: cleanup ← Previous edit		Latest revision as of 09:41, 14 March 2025 edit undo 218.253.235.230 (talk) No edit summary
(30 intermediate revisions by 21 users not shown)
Line 1: {{short description\|Type of artificial neural network}} ~~{{Orphan\|date=March 2016}}~~ '''Bidirectional ~~Recurrent~~[[recurrent ~~Neural~~neural ~~Networks~~networks]]''' ('''BRNN''') ~~were~~connect ~~invented~~two hidden layers of opposite directions to the same output. With this form of [[Generative model\|generative deep learning]], the output layer can get information from past (backwards) and future (forward) states simultaneously. Invented in 1997 by Schuster and Paliwal.,<ref name="Schuster">Schuster, Mike, and Kuldip K. Paliwal. "[https://www.researchgate.net/profile/Mike_Schuster/publication/3316656_Bidirectional_recurrent_neural_networks/links/56861d4008ae19758395f85c.pdf Bidirectional recurrent neural networks]." Signal Processing, IEEE Transactions on 45.11 (1997): 2673-2681.2. Awni Hannun, Carl Case, Jared Casper, Bryan Catanzaro, Greg Diamos, Erich Elsen, Ryan</ref> BRNNs were introduced to increase the amount of input information available to the network. For example, [[multilayer perceptron]] (MLPs) and [[time delay neural network]] (TDNNs) have limitations on the input data flexibility, as they require their input data to be fixed. Standard [[recurrent neural network]] (RNNs) also have restrictions as the future input information cannot be reached from the current state. On the contrary, BRNNs do not require their input data to be fixed. Moreover, their future input information is reachable from the current state.<ref>{{Cite ~~The~~arXiv ~~basic~~\|title=Recent ~~idea~~Advances ofin ~~BRNNs~~Recurrent isNeural toNetworks\|eprint ~~connect~~= ~~two~~1801.01078\|last1 ~~hidden~~= ~~layers~~Salehinejad\|first1 of= ~~opposite~~Hojjat\|last2 ~~directions~~= toSankar\|first2 ~~the~~= ~~same~~Sharan\|last3 ~~output.~~= ByBarfett\|first3 ~~this~~= ~~structure,~~Joseph\|last4 ~~the~~= ~~output~~Colak\|first4 ~~layer~~= ~~can~~Errol\|last5 ~~get~~= ~~information~~Valaee\|first5 ~~from~~= ~~past~~Shahrokh\|year ~~and~~= ~~future~~2017\| ~~states~~class=cs.NE }}</ref> BRNN are especially useful when the context of the input is needed. For example, in [[handwriting recognition]], the performance can be enhanced by knowledge of the letters located before and after the current letter. ==Architecture== [[File:~~RNN~~Structural ~~BRNN~~diagrams of unidirectional and bidirectional recurrent neural networks.png\|thumbnail\|Structure of RNN and BRNN<ref name="Schuster" />\|alt=\|350x350px]] The principle of BRNN is to split the neurons of a regular RNN into two directions, one for positive time direction (forward states), and another for negative time direction (backward states). Those two ~~states’~~states' output are not connected to inputs of the opposite direction states. The general structure of RNN and BRNN can be depicted in the right diagram. By using two time directions, input information from the past and future of the current time frame can be used unlike standard RNN which requires the delays for including future information.<ref name="Schuster" /> ==Training== BRNNs can be trained using similar algorithms to RNNs, because the two directional neurons do not have any interactions. However, when back-propagation through time is applied, additional processes are needed because updating input and output layers cannot be done at once. General procedures for training are as follows: For forward pass, forward states and backward states are passed first, then output neurons are passed. For backward pass, output neurons are passed first, then forward states and backward states are passed next. After forward and backward passes are done, the weights are updated.<ref name="Schuster" /> ==Applications== Line 18: Applications of BRNN include : Speech Recognition (Combined with [[Long short-term memory]])<ref>Graves, Alex, Santiago Fernández, and Jürgen Schmidhuber. "[https://mediatum.ub.tum.de/doc/1290195/file.pdf Bidirectional LSTM networks for improved phoneme classification and recognition]." Artificial Neural Networks: Formal Models and Their Applications–ICANN 2005. Springer Berlin Heidelberg, 2005. 799-804. </ref><ref>Graves, Alan, Navdeep Jaitly, and Abdel-rahman Mohamed. "[http://www.cs.toronto.edu/~graves/asru_2013.pdf Hybrid speech recognition with deep bidirectional LSTM]." Automatic Speech Recognition and Understanding (ASRU), 2013 IEEE Workshop on. IEEE, 2013.</ref> Translation<ref>Sundermeyer, Martin, et al. "[https://www.aclweb.org/anthology/D14-1003 Translation modeling with bidirectional recurrent neural networks]." Proceedings of the Conference on Empirical Methods on Natural Language Processing, October. 2014.</ref> Handwritten Recognition<ref>Liwicki, Marcus, et al. "[https://mediatum.ub.tum.de/doc/1289961/file.pdf A novel approach to on-line handwriting recognition based on bidirectional long short-term memory networks]." Proc. 9th Int. Conf. on Document Analysis and Recognition. Vol. 1. 2007.</ref> Industrial [[Soft sensor]]<ref>Lui, Chun Fai, et al. "[https://ieeexplore.ieee.org/ielx7/19/9717300/09718226.pdf A Supervised Bidirectional Long Short-Term Memory Network for Data-Driven Dynamic Soft Sensor Modeling]." IEEE Transactions on Instrumentation and Measurement 71 (2022): 1-13.</ref> Protein Structure Prediction<ref>Baldi, Pierre, et al. "[https://academic.oup.com/bioinformatics/article-pdf/15/11/937/693153/150937.pdf Exploiting the past and the future in protein secondary structure prediction]." Bioinformatics 15.11 (1999): 937-946.</ref><ref>Pollastri, Gianluca, and Aoife Mclysaght. "[https://academic.oup.com/bioinformatics/article/21/8/1719/250163 Porter: a new, accurate server for protein secondary structure prediction]." Bioinformatics 21.8 (2005): 1719-1720.</ref> Part-of-speech tagging Dependency Parsing<ref>{{Cite journal\|last1=Kiperwasser\|first1=Eliyahu\|last2=Goldberg\|first2=Yoav\|date=2016\|title=Simple and Accurate Dependency Parsing Using Bidirectional LSTM Feature Representations\|url=https://www.aclweb.org/anthology/Q16-1023/\|journal=Transactions of the Association for Computational Linguistics\|language=en-us\|volume=4\|pages=313–327\|doi=10.1162/tacl_a_00101\|arxiv=1603.04351\|bibcode=2016arXiv160304351K\|s2cid=1642392}}</ref> Dependency Parsing<ref>Grella and Cangialosi "Non-Projective Dependency Parsing via Latent Heads Representation" (2018).</ref> Entity Extraction<ref>{{Cite ~~arxiv~~arXiv\|~~last~~last1=Dernoncourt\|~~first~~first1=Franck\|last2=Lee\|first2=Ji Young\|last3=Szolovits\|first3=Peter\|date=2017-05-15\|title=NeuroNER: an easy-to-use program for named-entity recognition based on neural networks\|eprint=1705.05487\|class=cs.CL}}</ref> ~~==See also==~~ [[Artificial neural network]] * [[Recurrent neural networks]] * [[Long short-term memory]] ==References== Line 40 ⟶ 35: *[https://github.com/hycis/bidirectional_RNN] Implementation of BRNN/LSTM in Python with Theano [[Category:~~Artificial~~Neural ~~neural~~network ~~networks~~architectures]]