Deep learning: Difference between revisions

Content deleted Content added
BG19bot (talk | contribs)
m v1.38b - WP:WCW project (Link equal to linktext)
Yobot (talk | contribs)
m History: WP:CHECKWIKI error fixes using AWB (11993)
Line 63:
But these methods never won over the non-uniform internal-handcrafting Gaussian [[mixture model]]/[[Hidden Markov model]] (GMM-HMM) technology based on generative models of speech trained discriminatively.<ref name="Baker2009">{{cite journal | last1 = Baker | first1 = J. | last2 = Deng | first2 = Li | last3 = Glass | first3 = Jim | last4 = Khudanpur | first4 = S. | last5 = Lee | first5 = C.-H. | last6 = Morgan | first6 = N. | last7 = O'Shaughnessy | first7 = D. | year = 2009 | title = Research Developments and Directions in Speech Recognition and Understanding, Part 1 | url = | journal = IEEE Signal Processing Magazine | volume = 26 | issue = 3| pages = 75–80 | doi=10.1109/msp.2009.932166}}</ref>
A number of key difficulties have been methodologically analyzed, including gradient diminishing and weak temporal correlation structure in the neural predictive models.<ref name="Bengio1991">Y. Bengio (1991). "Artificial Neural Networks and their Application to Speech/Sequence Recognition," Ph.D. thesis, McGill University, Canada.</ref><ref name="Deng1994">{{cite journal | last1 = Deng | first1 = L. | last2 = Hassanein | first2 = K. | last3 = Elmasry | first3 = M. | year = 1994 | title = Analysis of correlation structure for a neural predictive model with applications to speech recognition | url = | journal = Neural Networks | volume = 7 | issue = 2| pages = 331–339 | doi=10.1016/0893-6080(94)90027-2}}</ref>
Additional difficulties were the lack of big training data and weaker computing power in these early days. Thus, most speech recognition researchers who understood such barriers moved away from neural nets to pursue generative modeling. An exception was at [[SRI International]] in the late 1990s. Funded by the US government's [[National_Security_AgencyNational Security Agency|NSA]] and [[DARPA]], SRI conducted research on deep neural networks in speech and speaker recognition. The speaker recognition team, led by [https://www.linkedin.com/in/larryheck Larry Heck], achieved the first significant success with deep neural networks in speech processing as demonstrated in the 1998 [http://www.nist.gov/itl/iad/mig/sre.cfm NIST (National Institute of Standards and Technology) Speaker Recognition evaluation] and later published in the journal of Speech Communication .<ref name="Heck2000">{{cite journal | last1 = Heck | first1 = L. | last2 = Konig | first2 = Y. | last3 = Sonmez | first3 = M. | last4 = Weintraub | first4 = M. | year = 2000 | title = Robustness to Telephone Handset Distortion in Speaker Recognition by Discriminative Feature Design | url = | journal = Speech Communication | volume = 31 | issue = 2| pages = 181–192}}</ref>. While SRI established success with deep neural networks in speaker recognition, they were unsuccessful in demonstrating similar success in speech recognition. Hinton et al. and Deng et al. reviewed part of this recent history about how their collaboration with each other and then with cross-group colleagues re-ignited neural networks research and initiated deep learning research and applications in speech recognition.<ref name=HintonDengYu2012/><ref name="ReferenceICASSP2013">{{cite journal|last1=Deng|first1=L.|last2=Hinton|first2=G.|last3=Kingsbury|first3=B.|title=New types of deep neural network learning for speech recognition and related applications: An overview (ICASSP)| date=2013}}</ref><ref name="HintonKeynoteICASSP2013">Keynote talk: Recent Developments in Deep Neural Networks. ICASSP, 2013 (by Geoff Hinton).</ref><ref name="interspeech2014Keynote">Keynote talk: "Achievements and Challenges of Deep Learning - From Speech Analysis and Recognition To Language and Multimodal Processing," Interspeech, September 2014.</ref>
 
The term "deep learning" gained traction in the mid-2000s after a publication by [[Geoffrey Hinton]] and Ruslan Salakhutdinov showed how a many-layered [[feedforward neural network]] could be effectively pre-trained one layer at a time, treating each layer in turn as an [[unsupervised learning|unsupervised]] [[restricted Boltzmann machine]], then fine-tuning it using [[supervised learning|supervised]] [[backpropagation]].<ref name="HINTON2007">G. E. Hinton., "Learning multiple layers of representation," ''Trends in Cognitive Sciences'', 11, pp. 428–434, 2007.</ref> In 1992, Schmidhuber had already implemented a very similar idea for the more general case of unsupervised deep hierarchies of [[recurrent neural network]]s, and also experimentally shown its benefits for speeding up supervised learning.<ref name="SCHMID1992">J. Schmidhuber., "Learning complex, extended sequences using the principle of history compression," ''Neural Computation'', 4, pp. 234–242, 1992.</ref><ref name="SCHMID1991">J. Schmidhuber., "My First Deep Learning System of 1991 + Deep Learning Timeline 1962–2013."</ref>