Content deleted Content added
m v1.38b - WP:WCW project (Link equal to linktext) |
m →History: WP:CHECKWIKI error fixes using AWB (11993) |
||
Line 63:
But these methods never won over the non-uniform internal-handcrafting Gaussian [[mixture model]]/[[Hidden Markov model]] (GMM-HMM) technology based on generative models of speech trained discriminatively.<ref name="Baker2009">{{cite journal | last1 = Baker | first1 = J. | last2 = Deng | first2 = Li | last3 = Glass | first3 = Jim | last4 = Khudanpur | first4 = S. | last5 = Lee | first5 = C.-H. | last6 = Morgan | first6 = N. | last7 = O'Shaughnessy | first7 = D. | year = 2009 | title = Research Developments and Directions in Speech Recognition and Understanding, Part 1 | url = | journal = IEEE Signal Processing Magazine | volume = 26 | issue = 3| pages = 75–80 | doi=10.1109/msp.2009.932166}}</ref>
A number of key difficulties have been methodologically analyzed, including gradient diminishing and weak temporal correlation structure in the neural predictive models.<ref name="Bengio1991">Y. Bengio (1991). "Artificial Neural Networks and their Application to Speech/Sequence Recognition," Ph.D. thesis, McGill University, Canada.</ref><ref name="Deng1994">{{cite journal | last1 = Deng | first1 = L. | last2 = Hassanein | first2 = K. | last3 = Elmasry | first3 = M. | year = 1994 | title = Analysis of correlation structure for a neural predictive model with applications to speech recognition | url = | journal = Neural Networks | volume = 7 | issue = 2| pages = 331–339 | doi=10.1016/0893-6080(94)90027-2}}</ref>
Additional difficulties were the lack of big training data and weaker computing power in these early days. Thus, most speech recognition researchers who understood such barriers moved away from neural nets to pursue generative modeling. An exception was at [[SRI International]] in the late 1990s. Funded by the US government's [[
The term "deep learning" gained traction in the mid-2000s after a publication by [[Geoffrey Hinton]] and Ruslan Salakhutdinov showed how a many-layered [[feedforward neural network]] could be effectively pre-trained one layer at a time, treating each layer in turn as an [[unsupervised learning|unsupervised]] [[restricted Boltzmann machine]], then fine-tuning it using [[supervised learning|supervised]] [[backpropagation]].<ref name="HINTON2007">G. E. Hinton., "Learning multiple layers of representation," ''Trends in Cognitive Sciences'', 11, pp. 428–434, 2007.</ref> In 1992, Schmidhuber had already implemented a very similar idea for the more general case of unsupervised deep hierarchies of [[recurrent neural network]]s, and also experimentally shown its benefits for speeding up supervised learning.<ref name="SCHMID1992">J. Schmidhuber., "Learning complex, extended sequences using the principle of history compression," ''Neural Computation'', 4, pp. 234–242, 1992.</ref><ref name="SCHMID1991">J. Schmidhuber., "My First Deep Learning System of 1991 + Deep Learning Timeline 1962–2013."</ref>
|