Revision as of 18:13, 28 April 2025 edit 176.162.190.101 (talk) Link completed Tag: Visual edit ← Previous edit		Revision as of 15:57, 8 May 2025 edit undo 81.252.62.13 (talk) Link corrected Tag: Visual edit Next edit →
Line 8: == History == The TDNN was introduced in the late 1980s and applied to a task of [[phoneme]] classification for automatic [[speech recognition]] in speech signals where the automatic determination of precise segments or feature boundaries was difficult or impossible. Because the TDNN recognizes phonemes and their underlying acoustic/phonetic features, independent of position in time, it improved performance over static classification.<ref name="phoneme detection" /><ref name=":0">Alexander Waibel, [https://isl.iar.kit.edu/downloads/Pheome_Recognition_Using_Time-Delay_Neural_Networks_SP87-100_6.pdf Phoneme Recognition Using Time-Delay Neural Networks], Procedures of the Institute of Electrical, Information and Communication Engineers (IEICE), December, 1987, Tokyo, Japan.</ref> It was also applied to two-dimensional signals (time-frequency patterns in speech,<ref name=":1">~~John B. Hampshire and Alexander Waibel, ''[~~http://papers.nips.cc/paper/213-connectionist-architectures-for-multi-speaker-phoneme-recognition.pdf Connectionist Architectures for Multi-Speaker Phoneme Recognition] {{Webarchive\|url=https://web.archive.org/web/20160411092444/http://papers.nips.cc/paper/213-connectionist-architectures-for-multi-speaker-phoneme-recognition.pdf \|date=2016-04-11 }}'', Advances in Neural Information Processing Systems, 1990, Morgan Kaufmann.</ref> and coordinate space pattern in OCR<ref name=":2">Stefan Jaeger, Stefan Manke, Juergen Reichert, Alexander Waibel, ''[https://www.researchgate.net/profile/Stefan_Jaeger/publication/220163530_Online_handwriting_recognition_the_NPen_recognizer_Int_J_Doc_Anal_Recognit_3169-180/links/0c96051af3e6133ed0000000.pdf Online handwriting recognition: the NPen++recognizer]'', International Journal on Document Analysis and Recognition Vol. 3, Issue 3, March 2001</ref>). [[Kunihiko Fukushima]] published the [[neocognitron]] in 1980.<ref name="intro">{{cite journal \|last=Fukushima \|first=Kunihiko \|year=1980 \|title=Neocognitron: A Self-organizing Neural Network Model for a Mechanism of Pattern Recognition Unaffected by Shift in Position \|url=https://www.cs.princeton.edu/courses/archive/spr08/cos598B/Readings/Fukushima1980.pdf \|url-status=live \|journal=Biological Cybernetics \|volume=36 \|issue=4 \|pages=193–202 \|doi=10.1007/BF00344251 \|pmid=7370364 \|s2cid=206775608 \|archive-url=https://web.archive.org/web/20140603013137/http://www.cs.princeton.edu/courses/archive/spr08/cos598B/Readings/Fukushima1980.pdf \|archive-date=3 June 2014 \|access-date=16 November 2013}}</ref> [[Max pooling]] appears in a 1982 publication on the neocognitron<ref>{{Cite journal \|last1=Fukushima \|first1=Kunihiko \|last2=Miyake \|first2=Sei \|date=1982-01-01 \|title=Neocognitron: A new algorithm for pattern recognition tolerant of deformations and shifts in position \|url=https://www.sciencedirect.com/science/article/abs/pii/0031320382900243 \|journal=Pattern Recognition \|volume=15 \|issue=6 \|pages=455–469 \|doi=10.1016/0031-3203(82)90024-3 \|bibcode=1982PatRe..15..455F \|issn=0031-3203}}</ref> and was in the 1989 publication in [[LeNet\|LeNet-5]].<ref>{{Cite journal \|last1=LeCun \|first1=Yann \|last2=Boser \|first2=Bernhard \|last3=Denker \|first3=John \|last4=Henderson \|first4=Donnie \|last5=Howard \|first5=R. \|last6=Hubbard \|first6=Wayne \|last7=Jackel \|first7=Lawrence \|date=1989 \|title=Handwritten Digit Recognition with a Back-Propagation Network \|url=https://proceedings.neurips.cc/paper/1989/hash/53c3bce66e43be4f209556518c2fcb54-Abstract.html \|journal=Advances in Neural Information Processing Systems \|publisher=Morgan-Kaufmann \|volume=2}}</ref> Line 37: === Large vocabulary speech recognition === Large vocabulary speech recognition requires recognizing sequences of phonemes that make up words subject to the constraints of a large pronunciation vocabulary. Integration of TDNNs into large vocabulary speech recognizers is possible by introducing state transitions and search between phonemes that make up a word. The resulting Multi-State Time-Delay Neural Network (MS-TDNN) can be trained discriminative from the word level, thereby optimizing the entire arrangement toward word recognition instead of phoneme classification.<ref name=":6" /><ref name=":7">~~Christoph~~[https://ieeexplore.ieee.org/document/319179 C. Bregler, ~~Hermann~~H. Hild, ~~Stefan~~S. Manke, ~~Alexander~~and A. Waibel, ~~''[http://isl.anthropomatik.kit.edu/cmu-kit/downloads/Improving_Connected_Letter_Recognition_by_Lipreading.pdf~~ "Improving ~~Connected~~connected ~~Letter~~letter ~~Recognition~~recognition by ~~Lipreading]''~~lipreading," ~~IEEE~~1993 ~~Proceedings~~IEEE International Conference on Acoustics, Speech, and Signal Processing, Minneapolis, MN, USA, 1993, pp. 557-560 vol.1, doi: 10.1109/ICASSP.1993.319179.]</ref><ref name=":2" /> === Speaker independence === Line 67: == References == {{reflist}}<ref>{{Cite journal \|last=~~Waibel~~Hampshire \|first=~~Alex~~John \|last2=~~Hanazawa~~Waibel \|first2=~~Toshiyuki~~Alex \|~~last3~~orig-date=~~Hinton~~November ~~\|first3=Geoffrey~~30, ~~\|last4=Shikano~~1989 \|~~first4~~editor-last=~~Kiyohiro~~Touretzky \|~~last5~~editor-first=~~Lang \|first5=Kevin \|date=April 1989~~David \|title=~~Phoneme~~Connectionist ~~recognition~~Architectures ~~using~~for ~~time~~Multi-~~delay~~Speaker ~~neural~~Phoneme ~~networks~~Recognition \|url=~~https~~http://~~www~~papers.~~researchgate~~nips.~~net~~cc/~~publication~~paper/~~391037926_Phoneme_Recognition_Using_Time~~213-~~Delay_Neural_Networks#fullTextFileContent~~connectionist-architectures-for-multi-speaker-phoneme-recognition \|journal=~~Acoustics,~~Advances ~~Speech~~in ~~and~~Neural ~~Signal~~Information Processing, ~~IEEE~~Systems ~~Transactions on~~2 \|~~volume~~page=~~37 \|pages=328~~ 203- ~~339 \|doi=10.1109/29.21701~~210}}</ref> <ref>{{Cite journal \|last=Waibel \|first=Alex \|last2=Hanazawa \|first2=Toshiyuki \|last3=Hinton \|first3=Geoffrey \|last4=Shikano \|first4=Kiyohiro \|last5=Lang \|first5=Kevin \|date=April 1989 \|title=Phoneme recognition using time-delay neural networks \|url=https://www.researchgate.net/publication/391037926_Phoneme_Recognition_Using_Time-Delay_Neural_Networks#fullTextFileContent \|journal=Acoustics, Speech and Signal Processing, IEEE Transactions on \|volume=37 \|pages=328 - 339 \|doi=10.1109/29.21701}}</ref> <ref>{{Cite journal \|last=Waibel \|first=Alex \|date=1987 \|orig-date=December \|title=Phoneme Recognition Using Time-Delay Neural Networks \|url=https://www.researchgate.net/publication/391037926_Phoneme_Recognition_Using_Time-Delay_Neural_Networks#fullTextFileContent \|journal=Conference: Meeting of the Institute of Electrical, Information and Communication Engineers (IEICE) \|___location=Japan}}</ref> [[Category:Neural network architectures]]

Time delay neural network: Difference between revisions