Windows Speech Recognition: Difference between revisions

Content deleted Content added
Champsin (talk | contribs)
mNo edit summary
m Removing link(s) Wikipedia:Articles for deletion/ITPro Today closed as soft delete (XFDcloser)
Line 20:
 
==History==
Microsoft was involved in speech recognition and [[speech synthesis]] research for many years before WSR. In 1993, Microsoft hired [[Xuedong Huang]] from [[Carnegie Mellon University]] to lead its speech development efforts; the company's research led to the development of the [[Speech Application Programming Interface|Speech API]] (SAPI) introduced in 1994.<ref name="TalkingWindowsVista">{{cite web |url=http://msdn2.microsoft.com/en-us/magazine/cc163663.aspx |title=Exploring New Speech Recognition And Synthesis APIs In Windows Vista |last=Brown |first=Robert |publisher=[[Microsoft]] |work=MSDN Magazine |archive-url=https://web.archive.org/web/20080307054756/http://msdn2.microsoft.com/en-us/magazine/cc163663.aspx |archive-date=March 7, 2008 |access-date=June 26, 2015}}</ref> Speech recognition had also been used in previous Microsoft products. [[Office XP]] and [[Microsoft Office 2003|Office 2003]] provided speech recognition capabilities among [[Internet Explorer]] and [[Microsoft Office]] applications;<ref name="SpeechXP">{{cite web |url=https://support.microsoft.com/en-us/kb/306901 |title=How To Use Speech Recognition in Windows XP |publisher=[[Microsoft]] |work=Windows Support |archive-url=https://web.archive.org/web/20150314222444/https://support.microsoft.com/en-us/kb/306901 |archive-date=March 14, 2015 |access-date=May 15, 2020}}</ref> it also enabled limited speech functionality in [[Windows 98]], [[Windows Me]], [[Windows NT 4.0]], and [[Windows 2000]].<ref name="Description">{{cite web |url=https://support.microsoft.com/en-us/kb/278927 |title=Description of the speech recognition and handwriting recognition methods in Word 2002 |publisher=[[Microsoft]] |work=Windows Support |archive-url=https://web.archive.org/web/20150703125056/https://support.microsoft.com/en-us/kb/278927 |archive-date=July 3, 2015 |access-date=March 26, 2018}}</ref> [[Windows XP]] [[Windows XP editions#Tablet PC Edition|Tablet PC Edition]] 2002 included speech recognition capabilities with the Tablet PC Input Panel,<ref name="WindowsXPTabletPCEdition">{{cite web |url=http://winsupersite.com/article/windows-xp2/windows-xp-tablet-pc-edition-reviewed-127413 |title=Windows XP Tablet PC Edition Review |last=Thurrott |first=Paul |date=June 25, 2002 |publisher=[[Penton (company)|Penton]] |work=[[Windows IT Pro]] |archive-url=https://web.archive.org/web/20110719201607/http://winsupersite.com/article/windows-xp2/windows-xp-tablet-pc-edition-reviewed-127413 |archive-date=July 19, 2011 |access-date=May 15, 2020}}</ref><ref name="Natural">{{cite web |url=http://download.microsoft.com/download/9/8/f/98f3fe47-dfc3-4e74-92a3-088782200fe7/TWDT05006_WinHEC05.ppt |title=Natural Input On Mobile PC Systems |last=Dresevic |first=Bodin |date=2005 |publisher=[[Microsoft]] |format=PPT |archive-url=https://web.archive.org/web/20051214132222/http://download.microsoft.com/download/9/8/f/98f3fe47-dfc3-4e74-92a3-088782200fe7/TWDT05006_WinHEC05.ppt |archive-date=December 14, 2005 |access-date=May 15, 2020}}</ref> and [[Microsoft Plus!#Microsoft Plus! for Windows XP|Microsoft Plus! for Windows XP]] enabled voice commands for Windows Media Player.<ref name="VoiceCommand">{{cite web |url=http://winsupersite.com/article/product-review/plus-for-windows-xp-review |title=Plus! for Windows XP Review |last=Thurrott |first=Paul |date=October 6, 2010 |publisher=[[Penton (company)|Penton]] |work=[[Windows IT Pro]] |archive-url=https://web.archive.org/web/20110705102849/http://winsupersite.com/article/product-review/plus-for-windows-xp-review |archive-date=July 5, 2011 |access-date=May 15, 2020}}</ref> However, these all required installation of speech recognition as a separate component; before Windows Vista, Windows did not include integrated or extensive speech recognition.<ref name="Natural"/> [[Microsoft Office 2007|Office 2007]] and later versions rely on WSR for speech recognition services.<ref name="Office2007SR">{{cite web |url=https://support.office.com/en-us/article/What-happened-to-speech-recognition-c6541b32-82df-4c18-bfa5-c411f45337d3 |title=What happened to speech recognition? |publisher=[[Microsoft]] |work=Office Support |archive-url=https://web.archive.org/web/20161110044211/https://support.office.com/en-us/article/What-happened-to-speech-recognition-c6541b32-82df-4c18-bfa5-c411f45337d3 |archive-date=November 10, 2016 |access-date=May 15, 2020}}</ref>
 
===Windows Vista===
[[File:WindowsVistaPreliminaryWSR.PNG|thumb|right|A prototype speech recognition [[Windows Aero#Aero Wizards|Aero Wizard]] in [[Windows Vista]] (then known as "Longhorn") [[Development of Windows Vista#Milestone 7|build 4093]].]]
At [[Windows Hardware Engineering Conference|WinHEC 2002]] Microsoft announced that Windows Vista (codenamed "Longhorn") would include advances in speech recognition and in features such as [[microphone array]] support<ref name="WinHEC2002">{{cite web |url=https://www.pcmag.com/article2/0,2817,1183143,00.asp |title=WinHEC: The Pregame Show |last=Stam |first=Nick |date=April 16, 2002 |publisher=[[Ziff Davis Media]] |work=[[PC Magazine]] |archive-url=https://web.archive.org/web/20150703193044/https://www.pcmag.com/article2/0,2817,1183143,00.asp |archive-date=July 3, 2015 |access-date=May 15, 2020}}</ref> as part of an effort to "provide a consistent quality audio infrastructure for natural (continuous) speech recognition and (discrete) command and control."<ref name="AudioConsiderations">{{cite web |url=http://download.microsoft.com/download/whistler/WHP/1.0/WXP/EN-US/WH02_AV01.exe |title=Audio Considerations for Voice-Enabled Applications |last=Flandern Van |first=Mike |date=2002 |publisher=[[Microsoft]] |work=[[Windows Hardware Engineering Conference]] |format=EXE |archive-url=https://web.archive.org/web/20020506020208/http://download.microsoft.com/download/whistler/WHP/1.0/WXP/EN-US/WH02_AV01.exe |archive-date=May 6, 2002 |access-date=March 30, 2018}}</ref> [[Bill Gates]] stated during [[Professional Developers Conference|PDC 2003]] that Microsoft would "build speech capabilities into the system — a big advance for that in 'Longhorn,' in both recognition and synthesis, real-time";<ref name="SpeechCapabilities">{{cite web |url=http://www.microsoft.com/billgates/speeches/2003/10-27PDC2003.asp |title=Bill Gates' Web Site — Speech Transcript, Microsoft Professional Developers Conference 2003 |publisher=[[Microsoft]] |date=October 27, 2003 |archive-url=https://web.archive.org/web/20040203152133/http://www.microsoft.com/billgates/speeches/2003/10-27PDC2003.asp |archive-date=February 3, 2004 |access-date=May 15, 2020}}</ref><ref name="SpeechPDC2003">{{cite web |url=http://windowsitpro.com/windows-server-2008/live-pdc-2003-day-1-monday |title=Live from PDC 2003: Day 1, Monday |last2=Furman |first2=Keith |last1=Thurrott |first1=Paul |date=October 26, 2003 |publisher=[[Penton (company)|Penton]] |work=[[Windows IT Pro]] |archive-url=https://web.archive.org/web/20130911021829/http://windowsitpro.com/windows-server-2008/live-pdc-2003-day-1-monday |archive-date=September 11, 2013 |access-date=May 15, 2020}}</ref> and pre-release builds during the [[development of Windows Vista]] included a speech engine with training features.<ref name="Windows2006">{{cite web |url=http://www.techhive.com/article/113631/article.html |title=Your Next OS: Windows 2006? |last=Spanbauer |first=Scott |date=December 4, 2003 |publisher=[[International Data Group|IDG]] |work=TechHive |access-date=June 25, 2015}}</ref> A PDC 2003 developer presentation stated Windows Vista would also include a user interface for microphone feedback and control, and user configuration and training features.<ref name="UserInputPDC2003">{{cite web |url=http://download.microsoft.com/download/6/6/9/669C56E3-12AF-48C5-AB2A-E7705F1BE37F/CLI351.ppt |title=Keyboard, Speech, and Pen Input in Your Controls |last2=Chambers |first2=Rob |last1=Gjerstad |first1=Kevin |date=2003 |publisher=[[Microsoft]] |work=[[Professional Developers Conference]] |format=PPT |archive-url=https://web.archive.org/web/20121219161523/http://download.microsoft.com/download/6/6/9/669C56E3-12AF-48C5-AB2A-E7705F1BE37F/CLI351.ppt |archive-date=December 19, 2012 |access-date=March 30, 2018}}</ref> Microsoft clarified the extent to which speech recognition would be integrated when it stated in a pre-release [[software development kit]] that "the common speech scenarios, like speech-enabling menus and buttons, will be enabled system-wide."<ref name="SpeechRecognitionLonghorn">{{cite web |url=http://longhorn.msdn.microsoft.com/lhsdk/speech/speechconcepts.aspx |title=Interacting with the Computer using Speech Input and Speech Output |date=2003 |publisher=[[Microsoft]] |work=[[MSDN]] |archive-url=https://web.archive.org/web/20040104193115/http://longhorn.msdn.microsoft.com/lhsdk/speech/speechconcepts.aspx |archive-date=January 4, 2004 |access-date=June 28, 2015}}</ref>
 
During WinHEC 2004 Microsoft included WSR as part of a strategy to improve productivity on mobile PCs.<ref name="MobilePCs">{{cite web |url=http://download.microsoft.com/download/1/8/f/18f8cee2-0b64-41f2-893d-a6f2295b40c8/SW04023_WINHEC2004.ppt |title=Windows For Mobile PCs And Tablet PCs — CY05 And Beyond |last=Suokko |first=Matti |date=2004 |publisher=[[Microsoft]] |format=PPT |archive-url=https://web.archive.org/web/20051214170817/http://download.microsoft.com/download/1/8/f/18f8cee2-0b64-41f2-893d-a6f2295b40c8/SW04023_WINHEC2004.ppt |archive-date=December 14, 2005 |access-date=May 15, 2020}}</ref><ref name="MobilePCs04">{{cite web |url=http://download.microsoft.com/download/1/8/f/18f8cee2-0b64-41f2-893d-a6f2295b40c8/SW04022_WINHEC2004.ppt |title=Windows For Mobile PCs and Tablet PCs — CY04 |last=Fish |first=Darrin |date=2004 |publisher=[[Microsoft]] |format=PPT |archive-url=https://web.archive.org/web/20051214170759/http://download.microsoft.com/download/1/8/f/18f8cee2-0b64-41f2-893d-a6f2295b40c8/SW04022_WINHEC2004.ppt |archive-date=December 14, 2005 |access-date=May 15, 2020}}</ref> Microsoft later emphasized [[accessibility]], new mobility scenarios, support for additional languages, and improvements to the speech user experience at WinHEC 2005. Unlike the speech support included in Windows XP, which was integrated with the Tablet PC Input Panel and required switching between separate Commanding and Dictation modes, Windows Vista would introduce a dedicated interface for speech input on the desktop and would unify the separate speech modes;<ref name="NaturalInput">{{cite web |url=http://download.microsoft.com/download/9/8/f/98f3fe47-dfc3-4e74-92a3-088782200fe7/TWDT05006_WinHEC05.ppt |title=Natural Input on Mobile PC Systems |last=Dresevic |first=Bodin |date=2005 |publisher=[[Microsoft]] |format=PPT |archive-url=https://web.archive.org/web/20051214132222/http://download.microsoft.com/download/9/8/f/98f3fe47-dfc3-4e74-92a3-088782200fe7/TWDT05006_WinHEC05.ppt |archive-date=December 14, 2005 |access-date=May 15, 2020}}</ref> users previously could not speak a command after dictating or vice versa without first switching between these two modes.<ref name="CommandingandDictation">{{cite web |url=http://blogs.msdn.com/b/robch/archive/2005/08/01/446131.aspx |title=Commanding and Dictation — One mode or two in Windows Vista? |last=Chambers |first=Rob |date=August 1, 2005 |publisher=[[Microsoft]] |work=[[Microsoft Developer Network|MSDN]] |access-date=June 30, 2015}}</ref> Windows Vista Beta 1 included integrated speech recognition.<ref name="WindowsVistaBeta1">{{cite web |url=http://winsupersite.com/product-review/windows-vista-beta-1-review-part-3 |title=Windows Vista Beta 1 Review (Part 3) |last=Thurrott |first=Paul |date=October 6, 2010 |publisher=[[Penton (company)|Penton]] |work=[[Windows IT Pro]] |archive-url=https://web.archive.org/web/20140823104310/http://winsupersite.com/product-review/windows-vista-beta-1-review-part-3 |archive-date=August 23, 2014 |access-date=May 15, 2020}}</ref> To incentivize company employees to analyze WSR for software [[software bug|glitch]]es and to provide feedback, Microsoft offered an opportunity for its testers to win a Premium model of the [[Xbox 360]].<ref name="MicrosoftWSRPoster">{{cite web |url=http://www.brian.levy3.net/proj_msft_poster1.html |title=Microsoft Speech Recognition poster |last=Levy |first=Brian |date=2006 |archive-url=https://web.archive.org/web/20061011080004/http://brian.levy3.net/proj_msft_poster1.html |archive-date=October 11, 2006 |access-date=May 15, 2020}}</ref>
 
During a demonstration by Microsoft on July 27, 2006—before Windows Vista's [[release to manufacturing]] (RTM)—a notable incident involving WSR occurred that resulted in an unintended output of "Dear aunt, let's set so double the killer delete select all" when several attempts to dictate led to consecutive output errors;<ref name="GoodDemos">{{cite web |url=http://blogs.reuters.com/blog/archives/1991 |title=Updated – When good demos go (very, very) bad |last=Auchard |first=Eric |date=July 28, 2006 |publisher=[[Thomson Reuters]] |archive-url=https://web.archive.org/web/20110521230956/http://blogs.reuters.com/blog/archives/1991 |archive-date=May 21, 2011 |url-status=dead |access-date=March 29, 2018}}</ref><ref name="MSNBC">{{cite web|url=http://www.nbcnews.com/id/14158843 |title=Software glitch foils Microsoft demo |date=August 2, 2006 |publisher=[[NBC News]] |archive-url=https://web.archive.org/web/20180328233150/http://www.nbcnews.com/id/14158843/ |archive-date=March 28, 2018 |access-date=May 15, 2020}}</ref> the incident was a subject of significant derision among analysts and journalists in the audience,<ref name="NeedsWork">{{cite web |url=http://www.infoworld.com/article/06/07/31/HNvoicevista_1.html |title=Vista voice-recognition feature needs work |last=Montalbano |first=Elizabeth |date=July 31, 2006 |publisher=[[International Data Group|IDG]] |work=[[InfoWorld]] |archive-url=https://web.archive.org/web/20060805091528/http://www.infoworld.com/article/06/07/31/HNvoicevista_1.html |archive-date=August 5, 2006 |access-date=June 26, 2015}}</ref><ref name="Stammers">{{cite web |url=http://www.techhive.com/article/126613/article.html |title=Vista's Voice Recognition Stammers |last=Montalbano |first=Elizabeth |date=July 31, 2006 |publisher=[[International Data Group|IDG]] |work=TechHive |archive-url=https://web.archive.org/web/20150703154114/http://www.techhive.com/article/126613/article.html |archive-date=July 3, 2015 |access-date=May 15, 2020}}</ref> despite another demonstration for application management and navigation being successful.<ref name="GoodDemos"/> Microsoft revealed these issues were due to an audio [[Gain (electronics)|gain]] glitch that caused the recognizer to distort commands and dictations; the glitch was fixed before Windows Vista's release.<ref name="FAM">{{cite web |url=http://blogs.msdn.com/b/robch/archive/2006/07/29/682479.aspx |title=FAM: Vista SR Demo failure — And now you know the rest of the story ... |last=Chambers |first=Rob |date=July 29, 2006 |publisher=[[Microsoft]] |work=[[Microsoft Developer Network|MSDN]] |archive-url=https://web.archive.org/web/20110522071447/http://blogs.msdn.com/b/robch/archive/2006/07/29/682479.aspx |archive-date=May 22, 2011 |access-date=May 15, 2020}}</ref>
Line 50:
WSR allows a user to control applications and the Windows [[desktop metaphor|desktop]] [[user interface]] through voice commands.<ref name="Guide"/> Users can dictate text within documents, email, and forms; control the operating system user interface; perform [[keyboard shortcut]]s; and move the [[cursor (computing)|mouse cursor]].<ref name="CommonCommands">{{cite web |url=http://windows.microsoft.com/en-us/windows/common-speech-recognition-commands#1TC=windows-vista |title=Windows Speech Recognition commands |publisher=[[Microsoft]] |work=Windows Support |access-date=May 15, 2020}}</ref> The majority of integrated applications in Windows Vista can be controlled;<ref name="Guide">{{cite web |url=https://msdn.microsoft.com/en-us/library/bb530325.aspx |title=Windows Vista Speech Recognition Step-by-Step Guide |last=Phillips |first=Todd |date=2007 |publisher=[[Microsoft]] |work=[[MSDN]] |access-date=June 30, 2015}}</ref> third-party applications must support the Text Services Framework for dictation.<ref name="TalkingWindowsVista"/> [[American English|English (U.S.)]], [[British English|English (U.K.)]], [[French language|French]], [[German language|German]], [[Japanese language|Japanese]], [[Mandarin Chinese]], and [[Spanish language|Spanish]] are supported languages.<ref name="SpeechRecognition">{{cite web |url=https://www.microsoft.com/enable/products/windowsvista/speech.aspx |title=Windows Speech Recognition |publisher=[[Microsoft]] |work=Microsoft Accessibility |archive-url=https://web.archive.org/web/20070204044614/https://www.microsoft.com/enable/products/windowsvista/speech.aspx |archive-date=February 4, 2007 |access-date=May 15, 2020}}</ref>
 
When started for the first time, WSR presents a microphone setup wizard and an optional interactive step-by-step tutorial that users can commence to learn basic commands while adapting the recognizer to their specific voice characteristics;<ref name="Guide"/> the tutorial is estimated to require approximately 10 minutes to complete.<ref name="MSR8">{{cite web |url=http://www.pcworld.com/article/3124761/windows/the-windows-weakness-no-one-mentions-speech-recognition.html |title=The Windows weakness no one mentions: Speech recognition |last=Hachman |first=Mark |date=May 10, 2017 |publisher=[[International Data Group|IDG]] |work=[[PC World]] |access-date=March 28, 2018}}</ref> The accuracy of the recognizer increases through regular use, which adapts it to contexts, grammars, patterns, and vocabularies.<ref name="SpeechRecognition"/><ref name="Privacy"/> Custom language models for the specific contexts, phonetics, and terminologies of users in particular occupational fields such as legal or medical are also supported.<ref name="CustomizedVocabularies">{{cite web |url=https://blogs.msdn.microsoft.com/robch/2005/09/20/customized-speech-vocabularies-in-windows-vista/ |title=Customized speech vocabularies in Windows Vista |last=Chambers |first=Rob |date=September 20, 2005 |publisher=[[Microsoft]] |work=[[Microsoft Developer Network|MSDN]] |access-date=March 29, 2018}}</ref> With [[Windows Search]],<ref name="ThurrottAllchin">{{cite web |url=http://www.itprotoday.com/jim-allchin-talks-windows-vista |title=Jim Allchin Talks Windows Vista |last=Thurrott |first=Paul |date=October 6, 2010 |publisher=[[Penton (company)|Penton]] |work=[[Windows IT Pro]] |archive-url=https://web.archive.org/web/20180328102911/http://www.itprotoday.com/jim-allchin-talks-windows-vista |archive-date=March 28, 2018 |access-date=May 15, 2020}}</ref> the recognizer also can optionally harvest text in documents, email, as well as handwritten [[tablet PC]] input to contextualize and disambiguate terms to improve accuracy; no information is sent to Microsoft.<ref name="Privacy">{{cite web |url=http://download.microsoft.com/download/7/9/4/7945a146-fc32-48c2-8c14-83b1b36696e5/Windows%20Vista%20Privacy%20Statement.rtf |title=Windows Vista Privacy Statement |date=2006 |format=RTF |publisher=[[Microsoft]] |archive-url=https://web.archive.org/web/20080830041216/http://download.microsoft.com/download/7/9/4/7945a146-fc32-48c2-8c14-83b1b36696e5/Windows%20Vista%20Privacy%20Statement.rtf |archive-date=August 30, 2008 |access-date=May 15, 2020}}</ref>
 
WSR is a locally processed speech recognition platform; it does not rely on cloud computing for accuracy, dictation, or recognition.<ref name="MicrosoftPrivacyStatement">{{cite web |url=https://privacy.microsoft.com/en-us/privacystatement |title=Microsoft Privacy Statement |publisher=[[Microsoft]] |access-date=May 12, 2020}}</ref> Speech profiles that store information about users are retained locally.<ref name="Privacy"/> Backups and transfers of profiles can be performed via [[Windows Easy Transfer]].<ref name="Transfer">{{cite web |url=http://blogs.msdn.com/b/robch/archive/2007/02/15/transferring-windows-speech-recognition-profiles-from-one-machine-to-another.aspx |title=Transferring Windows Speech Recognition profiles from one machine to another |last=Chambers |first=Rob |date=February 15, 2007 |publisher=[[Microsoft]] |work=[[Microsoft Developer Network|MSDN]] |access-date=June 28, 2015}}</ref>