Time delay neural network: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 04:56, 25 May 2025 edit Cosmia Nebula (talk \| contribs) Extended confirmed users 11,304 edits →Overview: arch Tag: Visual edit ← Previous edit		Latest revision as of 05:39, 13 August 2025 edit undo GünniX (talk \| contribs) Extended confirmed users 337,004 edits m v2.05 - Fix errors for CW project (Reference before punctuation - Reference list duplication) Tag: WPCleaner
(6 intermediate revisions by 5 users not shown)
Line 1: {{Short description\|Neural network architecture}} [[File:TDNN Diagram.png\|thumb\|right\|TDNN diagram]] Line 47 ⟶ 48: === State of the art === TDNN-based phoneme recognizers compared favourably in early comparisons with HMM-based phone models.<ref name="phoneme detection" /><ref name=":3" /> Modern deep TDNN architectures include many more hidden layers and sub-sample or pool connections over broader contexts at higher layers. They achieve up to 50% word error reduction over [[Mixture model\|GMM]]-based acoustic models.<ref name=":4">{{cite book \|doi=10.21437/Interspeech.2015-647 \|doi-access=free \|s2cid=8536162 \|chapter=A time delay neural network architecture for efficient modeling of long temporal contexts \|title=Interspeech 2015 \|date=2015 \|last1=Peddinti \|first1=Vijayaditya \|last2=Povey \|first2=Daniel \|last3=Khudanpur \|first3=Sanjeev \|pages=3214–3218 }}</ref><ref name=":5">David Snyder, Daniel Garcia-Romero, Daniel Povey, ''[http://danielpovey.com/files/2015_asru_tdnn_ubm.pdf A Time-Delay Deep Neural Network-Based Universal Background Models for Speaker Recognition]'', Proceedings of ASRU 2015.</ref> While the different layers of TDNNs are intended to learn features of increasing context width, they do model local contexts. When longer-distance relationships and pattern sequences have to be processed, learning states and state-sequences is important and TDNNs can be combined with other modelling techniques.<ref name=":6">{{Cite journal \|last1=Haffner \|first1=Patrick \|last2=Waibel \|first2=Alex \|date=1991 \|title=Multi-State Time Delay Networks for Continuous Speech Recognition \|url=https://proceedings.neurips.cc/paper_files/paper/1991/hash/069d3bb002acd8d7dd095917f9efe4cb-Abstract.html \|website=proceedings.neurips.cc \|volume=4 \|publisher=NIPS \|pages=135–142}}</ref><ref name=":1" /><ref name=":2" /> TDNN architectures have also been adapted to [[Spiking neural network\|Spiking Neural Networks]], leading to state-of-the-art results while lending themselves to energy-efficient [[Neuromorphic chip\|hardware implementations]].<ref>{{Cite journal \|last=D’Agostino \|first=Simone \|last2=Moro \|first2=Filippo \|last3=Torchet \|first3=Tristan \|last4=Demirağ \|first4=Yiğit \|last5=Grenouillet \|first5=Laurent \|last6=Castellani \|first6=Niccolò \|last7=Indiveri \|first7=Giacomo \|last8=Vianello \|first8=Elisa \|last9=Payvand \|first9=Melika \|date=2024-04-24 \|title=DenRAM: neuromorphic dendritic architecture with RRAM for efficient temporal processing with delays \|url=https://www.nature.com/articles/s41467-024-47764-w \|journal=Nature Communications \|language=en \|volume=15 \|issue=1 \|pages=3446 \|doi=10.1038/s41467-024-47764-w \|issn=2041-1723\|pmc=11043378 }}</ref> == Applications == Line 67 ⟶ 68: === Handwriting recognition === TDNNs have been used effectively in compact and high-performance [[handwriting recognition]] systems.<ref>{{Cite journal \|last=Guyon \|first=I. \|last2=Albrecht \|first2=P. \|last3=Le Cun \|first3=Y. \|last4=Denker \|first4=J. \|last5=Hubbard \|first5=W. \|date=1991-01-01 \|title=Design of a neural network character recognizer for a touch terminal \|url=https://www.sciencedirect.com/science/article/pii/003132039190081F \|journal=Pattern Recognition \|volume=24 \|issue=2 \|pages=105–119 \|doi=10.1016/0031-3203(91)90081-F \|issn=0031-3203\|url-access=subscription }}</ref> Shift-invariance was also adapted to spatial patterns (x/y-axes) in image offline handwriting recognition.<ref name=":2" /> === Video analysis === Line 85 ⟶ 86: == References == {{reflist}} {{reflist}}<ref>{{Cite journal \|last1=Haffner \|first1=Patrick \|last2=Waibel \|date=1991 \|orig-date=January 1991 \|editor-last=Lippman \|editor-first=Richard \|editor2-last=Moody \|editor2-first=John \|title=Multi-State Time Delay Networks for Continuous Speech Recognition \|url=https://www.researchgate.net/publication/221618146 \|journal=Advances in Neural Information Processing Systems \|publisher=Morgan Kaufman \|volume=4 \|pages=135–142}}</ref> ~~<ref>~~* {{Cite journal \|last1=~~Hampshire~~Haffner \|first1=~~John~~Patrick \|last2=Waibel \|~~first2~~date=~~Alex~~1991 \|orig-date=~~November~~January ~~30, 1989~~1991 \|editor-last=~~Touretzky~~Lippman \|editor-first=~~David~~Richard \|editor2-last=Moody \|editor2-first=John \|title=~~Connectionist~~Multi-State ~~Architectures~~Time Delay Networks for ~~Multi-Speaker~~Continuous ~~Phoneme~~Speech Recognition \|url=~~http~~https://~~papers~~www.~~nips~~researchgate.ccnet/~~paper~~publication/~~213-connectionist-architectures-for-multi-speaker-phoneme-recognition~~221618146 \|journal=Advances in Neural Information Processing Systems 2\|publisher=Morgan Kaufman \|~~date~~volume=~~1990~~4 \|~~page~~pages=~~203-210~~135–142}}~~</ref>~~ * {{Cite journal \|last1=Hampshire \|first1=John \|last2=Waibel \|first2=Alex \|orig-date=November 30, 1989 \|editor-last=Touretzky \|editor-first=David \|title=Connectionist Architectures for Multi-Speaker Phoneme Recognition \|url=http://papers.nips.cc/paper/213-connectionist-architectures-for-multi-speaker-phoneme-recognition \|journal=Advances in Neural Information Processing Systems 2 \|date=1990 \|page=203-210}} <ref>{{Cite journal \|last1=Waibel \|first1=Alex \|last2=Hanazawa \|first2=Toshiyuki \|last3=Hinton \|first3=Geoffrey \|last4=Shikano \|first4=Kiyohiro \|last5=Lang \|first5=Kevin \|date=April 1989 \|title=Phoneme recognition using time-delay neural networks \|url=https://www.researchgate.net/publication/391037926 \|journal= IEEE Transactions on Acoustics, Speech, and Signal Processing\|volume=37 \|issue=3 \|pages=328–339 \|doi=10.1109/29.21701}}</ref> ~~<ref>~~* {{Cite journal \|~~last~~last1=Waibel \|~~first~~first1=Alex \|~~date~~last2=~~1987~~Hanazawa \|first2=Toshiyuki \|last3=Hinton \|first3=Geoffrey \|last4=Shikano \|first4=Kiyohiro \|last5=Lang \|first5=Kevin \|~~orig-~~date=~~December~~April 1989 \|title=Phoneme ~~Recognition~~recognition ~~Using~~using ~~Time~~time-~~Delay~~delay ~~Neural~~neural ~~Networks~~networks \|url=https://www.researchgate.net/publication/391037926 \|journal=~~Conference:~~ ~~Meeting~~ ofIEEE ~~the~~Transactions ~~Institute~~on ~~of Electrical~~Acoustics, ~~Information~~Speech, and ~~Communication~~Signal ~~Engineers~~Processing\|volume=37 ~~(IEICE)~~\|issue=3 \|~~___location~~pages=~~Japan~~328–339 \|doi=10.1109/29.21701}}~~</ref>~~ * {{Cite journal \|last=Waibel \|first=Alex \|date=1987 \|orig-date=December \|title=Phoneme Recognition Using Time-Delay Neural Networks \|url=https://www.researchgate.net/publication/391037926 \|journal=Conference: Meeting of the Institute of Electrical, Information and Communication Engineers (IEICE) \|___location=Japan}} [[Category:Neural network architectures]] [[Category:1987 in artificial intelligence]]