Time delay neural network: Difference between revisions

Content deleted Content added
m cap, punct
m cap, punct
Line 28:
=== State of the art ===
 
TDNN-based phoneme recognizers compared favourably in early comparisons with HMM-based phone models.<ref name="phoneme detection" /><ref name=":3" /> Modern deep TDNN architectures include many more hidden layers and sub-sample or pool connections over broader contexts at higher layers. They achieve up to 50% word error reduction over [[Mixture model|GMM]] -based acoustic models.<ref name=":4">Vijayaditya Peddinti, Daniel Povey, Sanjeev Khudanpur, ''A time delay neural network architecture for efficient modeling of long temporal contexts'', Proceedings of Interspeech 2015</ref><ref name=":5">David Snyder, Daniel Garcia-Romero, Daniel Povey, ''A Time-Delay Deep Neural Network-Based Universal Background Models for Speaker Recognition'', Proceedings of ASRU 2015.</ref> While the different layers of TDNNs are intended to learn features of increasing context width, they do model local contexts. When longer-distance relationships and pattern sequences have to be processed, learning states and state-sequences is important and TDNNs can be combined with other modelling techniques <ref name=":6">Patrick Haffner, Alexander Waibel, ''Multi-State Time Delay Neural Networks for Continuous Speech Recognition'', Advances in Neural Information Processing Systems, 1992, Morgan Kaufmann.</ref><ref name=":1" /><ref name=":2" />
 
==Applications==
Line 50:
=== Lip-reading – audio-visual speech ===
 
TDNNs were also successfully used in early demonstrations of audio-visual speech, where the sounds of speech are complemented by visually reading lip movement.<ref name=":7" /> Here, TDNN -based recognizers used visual and acoustic features jointly to achieve improved recognition accuracy, particularly in the presence of noise, where complementary information from an alternate modality could be fused nicely in a neural net.
 
=== Handwriting recognition ===
Line 56:
TDNNs have been used effectively in compact and high-performance handwriting recognition systems. Shift-invariance was also adapted to spatial patterns (x/y-axes) in image offline handwriting recognition.<ref name=":2" />
 
=== Video Analysisanalysis ===
 
Video has a temporal dimension that makes a TDNN an ideal solution to analysing motion patterns. An example of this analysis is a combination of vehicle detection and recognizing pedestrians.<ref>Christian Woehler and Joachim K. Anlauf, Real-time object recognition on image sequences with the adaptable time delay neural network algorithm—applications for autonomous vehicles." Image and Vision Computing 19.9 (2001): 593-618.</ref> When examining videos, subsequent images are fed into the TDNN as input where each image is the next frame in the video. The strength of the TDNN comes from its ability to examine objects shifted in time forward and backward to define an object detectable as the time is altered. If an object can be recognized in this manner, an application can plan on that object to be found in the future and perform an optimal action.
Line 64:
Two-dimensional TDNNs were later applied to other image-recognition tasks under the name of “[[Convolutional neural network|Convolutional Neural Networks]]”, where shift-invariant training is applied to the x/y axes of an image.
 
=== Common Librarieslibraries ===
 
*TDNNs can be implemented in virtually all machine-learning frameworks using one-dimensional [[convolutional neural network]]s, due to the equivalence of the methods.