Revision as of 21:45, 8 September 2022 edit Colin M (talk \| contribs) Autopatrolled, Administrators 12,442 edits removed Category:Artificial neural networks; added Category:Neural network architectures using HotCat ← Previous edit		Revision as of 04:55, 26 October 2023 edit undo InternetArchiveBot (talk \| contribs) Bots, Pending changes reviewers 5,688,660 edits Rescuing 3 sources and tagging 0 as dead.) #IABot (v2.0.9.5) (AManWithNoPlan - 15897 Next edit →
Line 8: == History == The TDNN was introduced in the late 1980s and applied to a task of [[phoneme]] classification for automatic [[speech recognition]] in speech signals where the automatic determination of precise segments or feature boundaries was difficult or impossible. Because the TDNN recognizes phonemes and their underlying acoustic/phonetic features, independent of position in time, it improved performance over static classification.<ref name="phoneme detection" /><ref name=":0">Alexander Waibel, ''[http://www.inf.ufrgs.br/~engel/data/media/file/cmp121/waibel89_TDNN.pdf Phoneme Recognition Using Time-Delay Neural Networks]'', SP87-100, Meeting of the Institute of Electrical, Information and Communication Engineers (IEICE), December, 1987,Tokyo, Japan.</ref> It was also applied to two-dimensional signals (time-frequency patterns in speech,<ref name=":1">John B. Hampshire and Alexander Waibel, ''[http://papers.nips.cc/paper/213-connectionist-architectures-for-multi-speaker-phoneme-recognition.pdf Connectionist Architectures for Multi-Speaker Phoneme Recognition] {{Webarchive\|url=https://web.archive.org/web/20160411092444/http://papers.nips.cc/paper/213-connectionist-architectures-for-multi-speaker-phoneme-recognition.pdf \|date=2016-04-11 }}'', Advances in Neural Information Processing Systems, 1990, Morgan Kaufmann.</ref> and coordinate space pattern in OCR<ref name=":2">Stefan Jaeger, Stefan Manke, Juergen Reichert, Alexander Waibel, ''[https://www.researchgate.net/profile/Stefan_Jaeger/publication/220163530_Online_handwriting_recognition_the_NPen_recognizer_Int_J_Doc_Anal_Recognit_3169-180/links/0c96051af3e6133ed0000000.pdf Online handwriting recognition: the NPen++recognizer]'', International Journal on Document Analysis and Recognition Vol. 3, Issue 3, March 2001</ref>). === Max pooling === In 1990, Yamaguchi et al. introduced the concept of max pooling. They did so by combining TDNNs with max pooling in order to realize a speaker independent isolated word recognition system.<ref name=Yamaguchi111990>{{cite conference \|title=A Neural Network for Speaker-Independent Isolated Word Recognition \|last1=Yamaguchi \|first1=Kouichi \|last2=Sakamoto \|first2=Kenji \|last3=Akabane \|first3=Toshio \|last4=Fujimoto \|first4=Yoshiji \|date=November 1990 \|___location=Kobe, Japan \|conference=First International Conference on Spoken Language Processing (ICSLP 90) \|url=https://www.isca-speech.org/archive/icslp_1990/i90_1077.html \|access-date=2019-09-04 \|archive-date=2021-03-07 \|archive-url=https://web.archive.org/web/20210307233750/https://www.isca-speech.org/archive/icslp_1990/i90_1077.html \|url-status=dead }}</ref> == Overview == Line 28: === State of the art === TDNN-based phoneme recognizers compared favourably in early comparisons with HMM-based phone models.<ref name="phoneme detection" /><ref name=":3" /> Modern deep TDNN architectures include many more hidden layers and sub-sample or pool connections over broader contexts at higher layers. They achieve up to 50% word error reduction over [[Mixture model\|GMM]]-based acoustic models.<ref name=":4">Vijayaditya Peddinti, Daniel Povey, Sanjeev Khudanpur, ''[https://web.archive.org/web/20180306041537/https://pdfs.semanticscholar.org/ced2/11de5412580885279090f44968a428f1710b.pdf A time delay neural network architecture for efficient modeling of long temporal contexts]'', Proceedings of Interspeech 2015</ref><ref name=":5">David Snyder, Daniel Garcia-Romero, Daniel Povey, ''[http://danielpovey.com/files/2015_asru_tdnn_ubm.pdf A Time-Delay Deep Neural Network-Based Universal Background Models for Speaker Recognition]'', Proceedings of ASRU 2015.</ref> While the different layers of TDNNs are intended to learn features of increasing context width, they do model local contexts. When longer-distance relationships and pattern sequences have to be processed, learning states and state-sequences is important and TDNNs can be combined with other modelling techniques.<ref name=":6">Patrick Haffner, Alexander Waibel, ''[http://papers.nips.cc/paper/580-multi-state-time-delay-networks-for-continuous-speech-recognition.pdf Multi-State Time Delay Neural Networks for Continuous Speech Recognition] {{Webarchive\|url=https://web.archive.org/web/20160411090850/http://papers.nips.cc/paper/580-multi-state-time-delay-networks-for-continuous-speech-recognition.pdf \|date=2016-04-11 }}'', Advances in Neural Information Processing Systems, 1992, Morgan Kaufmann.</ref><ref name=":1" /><ref name=":2" /> == Applications ==

Time delay neural network: Difference between revisions