Windows Speech Recognition: Difference between revisions

Content deleted Content added
m Reverted 1 edit by 68.203.140.185 (talk) to last revision by Ian Wolfman (TW)
General edits as part of the GA review.
Line 20:
===Windows Vista===
[[File:WindowsVistaPreliminaryWSR.PNG||thumb|right|A prototype speech recognition [[Windows Aero#Aero Wizards|Aero Wizard]] in [[Windows Vista]] (then known as "Longhorn") [[Development of Windows Vista#Milestone 7|build 4093]].]]
At [[Windows Hardware Engineering Conference|WinHEC 2002]] Microsoft announced that Windows Vista (codenamed "Longhorn") would include advances in speech recognition and in features such as [[microphone array]] support<ref name="WinHEC2002">{{cite web |url=https://www.pcmag.com/article2/0,2817,1183143,00.asp |title=WinHEC: The Pregame Show |last=Stam |first=Nick |date=April 16, 2002 |publisher=[[Ziff Davis Media]] |work=[[PC Magazine]] |accessdate=June 26, 2015}}</ref> as part of an effort to "provide a consistent quality audio infrastructure for natural (continuous) speech recognition and (discrete) command and control."<ref name="AudioConsiderations">{{cite web |url=http://download.microsoft.com/download/whistler/WHP/1.0/WXP/EN-US/WH02_AV01.exe |title=Audio Considerations for Voice-Enabled Applications |last=Flandern Van |first=Mike |date=2002 |publisher=[[Microsoft]] |work=[[Windows Hardware Engineering Conference]] |format=EXE |archiveurl=https://web.archive.org/web/20020506020208/http://download.microsoft.com/download/whistler/WHP/1.0/WXP/EN-US/WH02_AV01.exe |archivedate=May 6, 2002 |accessdate=March 30, 2018}}</ref> [[Bill Gates]] stated during [[Professional Developers Conference|PDC 2003]] that Microsoft would "build speech capabilities into the system — a big advance for that in 'Longhorn,' in both recognition and synthesis, real-time";<ref name="SpeechCapabilities">{{cite web |url=http://www.microsoft.com/billgates/speeches/2003/10-27PDC2003.asp |title=Bill Gates' Web Site - Speech Transcript, Microsoft Professional Developers Conference 2003 |author=[[Microsoft]] |date=October 27, 2003 |archiveurl=https://web.archive.org/web/20040203152133/http://www.microsoft.com/billgates/speeches/2003/10-27PDC2003.asp |archivedate=February 3, 2004 |accessdate=June 26, 2015}}</ref><ref name="SpeechPDC2003">{{cite web |url=http://windowsitpro.com/windows-server-2008/live-pdc-2003-day-1-monday |title=Live from PDC 2003: Day 1, Monday |last2=Furman |first2=Keith |last=Thurrott |first=Paul |date=October 26, 2003 |publisher=[[Penton (company)|Penton]] |work=[[Windows IT Pro]] |accessdate=June 26, 2015}}</ref> and pre-release builds during the [[development of Windows Vista]] included a speech engine with training features.<ref name="Windows2006">{{cite web |url=http://www.techhive.com/article/113631/article.html |title=Your Next OS: Windows 2006? |last=Spanbauer |first=Scott |date=December 4, 2003 |publisher=[[International Data Group|IDG]] |work=TechHive |accessdate=June 25, 2015}}</ref> A PDC 2003 developer presentation stated Windows Vista would also include a user interface for microphone feedback and control, and user configuration and training features.<ref name="UserInputPDC2003">{{cite web |url=http://download.microsoft.com/download/6/6/9/669C56E3-12AF-48C5-AB2A-E7705F1BE37F/CLI351.ppt |title=Keyboard, Speech, and Pen Input in Your Controls |last2=Chambers |first2=Rob |last1=Gjerstad |first=Kevin |date=2003 |publisher=[[Microsoft]] |work=[[Professional Developers Conference]] |format=PPT |archiveurl=https://web.archive.org/web/20121219161523/http://download.microsoft.com/download/6/6/9/669C56E3-12AF-48C5-AB2A-E7705F1BE37F/CLI351.ppt |archivedate=December 19, 2012 |accessdate=March 30, 2018}}</ref> Microsoft clarified the extent to which speech recognition would be integrated when it stated in a pre-release [[software development kit]] that "the common speech scenarios, like speech-enabling menus and buttons, will be enabled system-wide."<ref name="SpeechRecognitionLonghorn">{{cite web |url=http://longhorn.msdn.microsoft.com/lhsdk/speech/speechconcepts.aspx |title=Interacting with the Computer using Speech Input and Speech Output |author=[[Microsoft]] |date=2003 |work=[[MSDN]] |archiveurl=https://web.archive.org/web/20040104193115/http://longhorn.msdn.microsoft.com/lhsdk/speech/speechconcepts.aspx |archivedate=January 4, 2004 |accessdate=June 28, 2015}}</ref>
 
During WinHEC 2004 Microsoft included WSR as part of a strategy to improve productivity on mobile PCs.<ref name="MobilePCs">{{cite web |url=http://download.microsoft.com/download/1/8/f/18f8cee2-0b64-41f2-893d-a6f2295b40c8/SW04023_WINHEC2004.ppt |title=Windows For Mobile PCs And Tablet PCs - CY05 And Beyond |last=Suokko |first=Matti |date=2004 |publisher=[[Microsoft]] |archiveurl=https://web.archive.org/web/20051214170817/http://download.microsoft.com/download/1/8/f/18f8cee2-0b64-41f2-893d-a6f2295b40c8/SW04023_WINHEC2004.ppt |archivedate=December 14, 2005 |format=PPT |accessdate=July 15, 2015}}</ref><ref name="MobilePCs04">{{cite web |url=http://download.microsoft.com/download/1/8/f/18f8cee2-0b64-41f2-893d-a6f2295b40c8/SW04022_WINHEC2004.ppt |title=Windows For Mobile PCs and Tablet PCs - CY04 |last=Fish |first=Darrin |date=2004 |publisher=[[Microsoft]] |archiveurl=https://web.archive.org/web/20051214170759/http://download.microsoft.com/download/1/8/f/18f8cee2-0b64-41f2-893d-a6f2295b40c8/SW04022_WINHEC2004.ppt |archivedate=December 14, 2005 |format=PPT |accessdate=July 15, 2015}}</ref> Microsoft later emphasized [[accessibility]], new mobility scenarios, support for additional languages, and improvements to the speech user experience at WinHEC 2005. Unlike the speech support included in Windows XP, which was integrated with the Tablet PC Input Panel and required switching between separate Commanding and Dictation modes, Windows Vista would introduce a dedicated interface for speech input on the desktop and would unify the separate speech modes;<ref name="NaturalInput">{{cite web |url=http://download.microsoft.com/download/9/8/f/98f3fe47-dfc3-4e74-92a3-088782200fe7/TWDT05006_WinHEC05.ppt |title=Natural Input on Mobile PC Systems |last=Dresevic |first=Bodin |date=2005 |publisher=[[Microsoft]] |format=PPT |archiveurl=https://web.archive.org/web/20051214132222/http://download.microsoft.com/download/9/8/f/98f3fe47-dfc3-4e74-92a3-088782200fe7/TWDT05006_WinHEC05.ppt |archivedate=December 14, 2005 |accessdate=March 29, 2018}}</ref> users previously could not speak a command after dictating or vice versa without first switching between these two modes.<ref name="CommandingandDictation">{{cite web |url=http://blogs.msdn.com/b/robch/archive/2005/08/01/446131.aspx |title=Commanding and Dictation - One mode or two in Windows Vista? |last=Chambers |first=Rob |date=August 1, 2005 |publisher=[[Microsoft]] |work=[[Microsoft Developer Network|MSDN]] |accessdate=June 30, 2015}}</ref> Windows Vista Beta 1 included integrated speech recognition.<ref name="WindowsVistaBeta1">{{cite web |url=http://winsupersite.com/product-review/windows-vista-beta-1-review-part-3 |title=Windows Vista Beta 1 Review (Part 3) |last=Thurrott |first=Paul |authorlink=Paul Thurrott |date=October 6, 2010 |publisher=[[Penton (company)|Penton]] |work=[[Windows IT Pro]] |accessdate=June 26, 2015}}</ref> To incentivize company employees to analyze WSR for software [[software bug|glitch]]es and to provide feedback, Microsoft offered an opportunity for its testers to win a Premium model of the [[Xbox 360]].<ref name="MicrosoftWSRPoster">{{cite web |url=http://www.brian.levy3.net/proj_msft_poster1.html |title=Microsoft Speech Recognition poster |last=Levy |first=Brian |date=2006 |archiveurl=https://web.archive.org/web/20061011080004/http://brian.levy3.net/proj_msft_poster1.html |archivedate=October 11, 2006 |accessdate=March 17, 2016}}</ref>
 
During a demonstration by Microsoft on July 27, 2006—before Windows Vista's [[release to manufacturing]] (RTM)—a notable incident involving WSR occurred that resulted in an unintended output of "Dear aunt, let's set so double the killer delete select all" when several attempts to dictate led to consecutive output errors;<ref name="GoodDemos">{{cite web |url=http://blogs.reuters.com/blog/archives/1991 |title=UPDATED-When good demos go (very, very) bad |last=Auchard |first=Eric |date=July 28, 2006 |publisher=[[Thomson Reuters]] |archiveurl=https://web.archive.org/web/20110521230956/http://blogs.reuters.com/blog/archives/1991 |archivedate=May 21, 2011 |accessdate=March 29, 2018}}</ref><ref name="MSNBC">{{cite web|url=http://www.nbcnews.com/id/14158843 |title=Software glitch foils Microsoft demo |author=[[NBC News]] |date=August 2, 2006 |publisher=[[Associated Press]] |accessdate=June 30, 2015 }}</ref> the incident was a subject of significant derision among analysts and journalists in the audience.<ref name="NeedsWork">{{cite web |url=http://www.infoworld.com/article/06/07/31/HNvoicevista_1.html |title=Vista voice-recognition feature needs work |last=Montalbano |first=Elizabeth |date=July 31, 2006 |publisher=[[International Data Group|IDG]] |work=[[InfoWorld]] |archiveurl=https://web.archive.org/web/20060805091528/http://www.infoworld.com/article/06/07/31/HNvoicevista_1.html |archivedate=August 5, 2006 |accessdate=June 26, 2015}}</ref><ref name="Stammers">{{cite web |url=http://www.techhive.com/article/126613/article.html |title=Vista's Voice Recognition Stammers |last=Montalbano |first=Elizabeth |date=July 31, 2006 |publisher=[[International Data Group|IDG]] |work=TechHive |accessdate=July 1, 2015}}</ref> Microsoft later revealed that these issues were due to an audio [[Gain (electronics)|gain]] glitch that caused the speech recognizer to distort the dictated words;<ref name="FAM">{{cite web |url=http://blogs.msdn.com/b/robch/archive/2006/07/29/682479.aspx |title=FAM: Vista SR Demo failure -- And now you know the rest of the story ... |last=Chambers |first=Rob |date=July 29, 2006 |publisher=[[Microsoft]] |work=[[Microsoft Developer Network|MSDN]] |accessdate=June 26, 2015}}</ref> the glitch was fixed before Windows Vista's release.<ref name="FAM"/>
 
Reports from early 2007 indicated that WSR is vulnerable to attackers using speech recognition for malicious operations by playing certain audio commands through a target's speakers;<ref name="SpeechRecognitionHole">{{cite web |url=http://news.bbc.co.uk/2/hi/technology/6320865.stm |title=Vista has speech recognition hole |date=February 1, 2007 |publisher=[[British Broadcasting Corporation|BBC]] |work=[[BBC News]] |accessdate=March 29, 2018}}</ref><ref name="RemoteExploit">{{cite web |url=https://www.engadget.com/2007/02/01/remote-exploit-of-vista-speech-reveals-fatal-flaw/ |title=Remote 'exploit' of Vista Speech reveals fatal flaw |last=Miller |first=Paul |date=February 1, 2007 |publisher=[[AOL]] |work=[[Engadget]] |accessdate=June 28, 2015}}</ref> it was the first vulnerability discovered after Windows Vista's [[Software release life cycle#General availability|general availability]].<ref name="PCWorld">{{cite web |url=http://www.pcworld.com/article/id,128737-c,vistalonghorn/article.html |title=Honeymoon's Over: First Windows Vista Flaw |last=Roberts |first=Paul |date=February 1, 2007 |publisher=[[International Data Group|IDG]] |work=[[PCWorld]] |archiveurl=https://web.archive.org/web/20070204030144/http://www.pcworld.com/article/id,128737-c,vistalonghorn/article.html |archivedate=February 4, 2007 |accessdate=June 28, 2015}}</ref> Microsoft stated that although such an attack is theoretically possible, a number of mitigating factors and prerequisites would limit its effectiveness or prevent it altogether: a target would need the recognizer to be active and configured to properly interpret such commands; microphones and speakers would both need to be enabled and at sufficient volume levels; and an attack would require the computer to perform visible operations and produce audible feedback without users noticing. [[User Account Control]] would also prohibit the occurrence of privileged operations.<ref name="SpeechIssue">{{cite web |url=https://blogs.technet.microsoft.com/msrc/2007/01/31/issue-regarding-windows-vista-speech-recognition/ |title=Issue regarding Windows Vista Speech Recognition |date=January 31, 2007 |publisher=[[Microsoft]] |work=[[Microsoft TechNet|TechNet]] |archive-url=https://web.archive.org/web/20160520045703/https://blogs.technet.microsoft.com/msrc/2007/01/31/issue-regarding-windows-vista-speech-recognition/ |url-status=dead |archivedate=May 20, 2016 |accessdate=March 31, 2018}}</ref>
Line 57:
[[File:WSR-AlternatesPanel.png|thumb|200px|right|The alternates panel displaying suggestions for a phrase.]]
====Alternates panel====
An alternates panel disambiguation interface lists items interpreted as being relevant to a user's spoken word(s); if the word or phrase that a user desired to insert into an application is listed among results, a user can speak the corresponding number of the word or phrase in the results and confirm this choice by speaking "OK" to insert it within the application.<ref name="Modes">{{cite web |url=http://blogs.msdn.com/b/robch/archive/2007/11/19/speech-macros-typing-mode-and-spelling-mode-in-windows-speech-recognition.aspx |title=Speech Macros, Typing Mode and Spelling Mode in Windows Speech Recognition |last=Chambers |first=Rob |date=November 19, 2007 |publisher=[[Microsoft]] |work=[[Microsoft Developer Network|MSDN]] |accessdate=August 25, 2015}}</ref> The alternates panel also appear when launching applications or speaking commands that refer to more than one item (e.g., speaking "Start Internet Explorer" may list both the web browser and a separate version with add-ons disabled). An ''ExactMatchOverPartialMatch'' entry in the [[Windows Registry]] can limit commands to items with exact names if there is more than one instance included in results.<ref name="Clarification">{{cite web |url=http://blogs.msdn.com/b/robch/archive/2007/05/07/windows-speech-recognition-exactmatchoverpartialmatch.aspx |title=Windows Speech Recognition - ExactMatchOverPartialMatch |last=Chambers |first=Rob |date=May 7, 2007 |publisher=[[Microsoft]] |work=[[Microsoft Developer Network|MSDN]] |accessdate=August 24, 2015}}</ref>
 
===Common commands===
Line 80:
 
====Speech dictionary====
A personal dictionary allows users to include or exclude certain words or expressions from dictation.<ref name="CustomizedVocabularies"/> When a user adds a word beginning with a capital letter to the dictionary, a user can specify whether it should always be capitalized or if capitalization depends on the context in which the word is spoken. Users can also record pronunciations for words added to the dictionary to increase recognition accuracy; words written via a [[stylus]] on a [[tablet PC]] for the Windows [[handwriting recognition]] feature are also stored. Information stored within a dictionary is included as part of a user's speech profile.<ref name="Privacy"/> Users can open the speech dictionary by speaking the "show speech dictionary" command.
 
===Macros===
Line 90:
 
==Performance==
{{As of|2017}} WSR uses Microsoft Speech Recognizer 8.0, the version introduced in Windows Vista. For dictation it was found to be 93.6% accurate without training by Mark Hachman, a Senior Editor of ''[[PC World]]''—a rate that is not as accurate as competing software. According to Microsoft, the rate of accuracy when trained is 99%. Hachman opined that Microsoft does not publicly discuss the feature because of the 2006 incident during the development of Windows Vista, with the result being that few users knew that documents could be dictated within Windows before the introduction of [[Cortana]].<ref name="MSR8"/>
 
==See also==