Time delay neural network: Difference between revisions

Content deleted Content added
Link completed
Link corrected
Line 8:
 
== History ==
The TDNN was introduced in the late 1980s and applied to a task of [[phoneme]] classification for automatic [[speech recognition]] in speech signals where the automatic determination of precise segments or feature boundaries was difficult or impossible. Because the TDNN recognizes phonemes and their underlying acoustic/phonetic features, independent of position in time, it improved performance over static classification.<ref name="phoneme detection" /><ref name=":0">Alexander Waibel, [https://isl.iar.kit.edu/downloads/Pheome_Recognition_Using_Time-Delay_Neural_Networks_SP87-100_6.pdf Phoneme Recognition Using Time-Delay Neural Networks], Procedures of the Institute of Electrical, Information and Communication Engineers (IEICE), December, 1987, Tokyo, Japan.</ref> It was also applied to two-dimensional signals (time-frequency patterns in speech,<ref name=":1">John B. Hampshire and Alexander Waibel, ''[http://papers.nips.cc/paper/213-connectionist-architectures-for-multi-speaker-phoneme-recognition.pdf Connectionist Architectures for Multi-Speaker Phoneme Recognition] {{Webarchive|url=https://web.archive.org/web/20160411092444/http://papers.nips.cc/paper/213-connectionist-architectures-for-multi-speaker-phoneme-recognition.pdf |date=2016-04-11 }}'', Advances in Neural Information Processing Systems, 1990, Morgan Kaufmann.</ref> and coordinate space pattern in OCR<ref name=":2">Stefan Jaeger, Stefan Manke, Juergen Reichert, Alexander Waibel, ''[https://www.researchgate.net/profile/Stefan_Jaeger/publication/220163530_Online_handwriting_recognition_the_NPen_recognizer_Int_J_Doc_Anal_Recognit_3169-180/links/0c96051af3e6133ed0000000.pdf Online handwriting recognition: the NPen++recognizer]'', International Journal on Document Analysis and Recognition Vol. 3, Issue 3, March 2001</ref>).
 
[[Kunihiko Fukushima]] published the [[neocognitron]] in 1980.<ref name="intro">{{cite journal |last=Fukushima |first=Kunihiko |year=1980 |title=Neocognitron: A Self-organizing Neural Network Model for a Mechanism of Pattern Recognition Unaffected by Shift in Position |url=https://www.cs.princeton.edu/courses/archive/spr08/cos598B/Readings/Fukushima1980.pdf |url-status=live |journal=Biological Cybernetics |volume=36 |issue=4 |pages=193–202 |doi=10.1007/BF00344251 |pmid=7370364 |s2cid=206775608 |archive-url=https://web.archive.org/web/20140603013137/http://www.cs.princeton.edu/courses/archive/spr08/cos598B/Readings/Fukushima1980.pdf |archive-date=3 June 2014 |access-date=16 November 2013}}</ref> [[Max pooling]] appears in a 1982 publication on the neocognitron<ref>{{Cite journal |last1=Fukushima |first1=Kunihiko |last2=Miyake |first2=Sei |date=1982-01-01 |title=Neocognitron: A new algorithm for pattern recognition tolerant of deformations and shifts in position |url=https://www.sciencedirect.com/science/article/abs/pii/0031320382900243 |journal=Pattern Recognition |volume=15 |issue=6 |pages=455–469 |doi=10.1016/0031-3203(82)90024-3 |bibcode=1982PatRe..15..455F |issn=0031-3203}}</ref> and was in the 1989 publication in [[LeNet|LeNet-5]].<ref>{{Cite journal |last1=LeCun |first1=Yann |last2=Boser |first2=Bernhard |last3=Denker |first3=John |last4=Henderson |first4=Donnie |last5=Howard |first5=R. |last6=Hubbard |first6=Wayne |last7=Jackel |first7=Lawrence |date=1989 |title=Handwritten Digit Recognition with a Back-Propagation Network |url=https://proceedings.neurips.cc/paper/1989/hash/53c3bce66e43be4f209556518c2fcb54-Abstract.html |journal=Advances in Neural Information Processing Systems |publisher=Morgan-Kaufmann |volume=2}}</ref>
Line 37:
 
=== Large vocabulary speech recognition ===
Large vocabulary speech recognition requires recognizing sequences of phonemes that make up words subject to the constraints of a large pronunciation vocabulary. Integration of TDNNs into large vocabulary speech recognizers is possible by introducing state transitions and search between phonemes that make up a word. The resulting Multi-State Time-Delay Neural Network (MS-TDNN) can be trained discriminative from the word level, thereby optimizing the entire arrangement toward word recognition instead of phoneme classification.<ref name=":6" /><ref name=":7">Christoph[https://ieeexplore.ieee.org/document/319179 C. Bregler, HermannH. Hild, StefanS. Manke, Alexanderand A. Waibel, ''[http://isl.anthropomatik.kit.edu/cmu-kit/downloads/Improving_Connected_Letter_Recognition_by_Lipreading.pdf "Improving Connectedconnected Letterletter Recognitionrecognition by Lipreading]''lipreading," IEEE1993 ProceedingsIEEE International Conference on Acoustics, Speech, and Signal Processing, Minneapolis, MN, USA, 1993, pp. 557-560 vol.1, doi: 10.1109/ICASSP.1993.319179.]</ref><ref name=":2" />
 
=== Speaker independence ===
Line 67:
 
== References ==
{{reflist}}<ref>{{Cite journal |last=WaibelHampshire |first=AlexJohn |last2=HanazawaWaibel |first2=ToshiyukiAlex |last3orig-date=HintonNovember |first3=Geoffrey30, |last4=Shikano1989 |first4editor-last=KiyohiroTouretzky |last5editor-first=Lang |first5=Kevin |date=April 1989David |title=PhonemeConnectionist recognitionArchitectures usingfor timeMulti-delaySpeaker neuralPhoneme networksRecognition |url=httpshttp://wwwpapers.researchgatenips.netcc/publicationpaper/391037926_Phoneme_Recognition_Using_Time213-Delay_Neural_Networks#fullTextFileContentconnectionist-architectures-for-multi-speaker-phoneme-recognition |journal=Acoustics,Advances Speechin andNeural SignalInformation Processing, IEEESystems Transactions on2 |volumepage=37 |pages=328 203- 339 |doi=10.1109/29.21701210}}</ref>
<ref>{{Cite journal |last=Waibel |first=Alex |last2=Hanazawa |first2=Toshiyuki |last3=Hinton |first3=Geoffrey |last4=Shikano |first4=Kiyohiro |last5=Lang |first5=Kevin |date=April 1989 |title=Phoneme recognition using time-delay neural networks |url=https://www.researchgate.net/publication/391037926_Phoneme_Recognition_Using_Time-Delay_Neural_Networks#fullTextFileContent |journal=Acoustics, Speech and Signal Processing, IEEE Transactions on |volume=37 |pages=328 - 339 |doi=10.1109/29.21701}}</ref>
<ref>{{Cite journal |last=Waibel |first=Alex |date=1987 |orig-date=December |title=Phoneme Recognition Using Time-Delay Neural Networks |url=https://www.researchgate.net/publication/391037926_Phoneme_Recognition_Using_Time-Delay_Neural_Networks#fullTextFileContent |journal=Conference: Meeting of the Institute of Electrical, Information and Communication Engineers (IEICE) |___location=Japan}}</ref>
[[Category:Neural network architectures]]