Content deleted Content added
tw |
Removed a paragraph from the lead Tags: Mobile edit Mobile web edit Advanced mobile edit |
||
(32 intermediate revisions by 23 users not shown) | |||
Line 1:
{{Short description|Speech recognition software}}
{{good article}}
{{Infobox software
| logo =
| name = Windows Speech Recognition▼
| logo size = 64px
|
| screenshot_size = 300px | caption = The tutorial for Windows Speech Recognition in [[Windows Vista]] depicting the selection of text in [[WordPad]] for deletion. | developer
| released
| operating system = [[Windows Vista]] and later
| genre
}}
'''Windows Speech Recognition''' ('''WSR''') is [[speech recognition]] developed by [[Microsoft]] for [[Windows Vista]] that enables [[hands-free computing|voice commands]] to control the [[desktop metaphor|desktop]] [[user interface]]
WSR is a locally processed speech recognition platform; it does not rely on [[cloud computing]] for accuracy, dictation, or recognition, but adapts based on contexts, grammars, speech samples, training sessions, and vocabularies. It provides a personal dictionary that allows users to include or exclude words or expressions from dictation and to record pronunciations to increase recognition accuracy. Custom language models are also supported.
With Windows Vista, WSR was developed to be part of Windows, as speech recognition was previously exclusive to applications such as [[Windows Media Player]]. It is present in [[Windows 7]], [[Windows 8]], [[Windows 8.1]], [[Windows RT]], [[Windows 10]], and [[Windows
==History==
Microsoft was involved in speech recognition and [[speech synthesis]] research for many years before WSR. In 1993, Microsoft hired [[Xuedong Huang]] from [[Carnegie Mellon University]] to lead its speech development efforts; the company's research led to the development of the [[Speech Application Programming Interface|Speech API]] (SAPI) introduced in 1994.<ref name="TalkingWindowsVista">{{cite web |url=http://msdn2.microsoft.com/en-us/magazine/cc163663.aspx |title=Exploring New Speech Recognition And Synthesis APIs In Windows Vista |last=Brown |first=Robert |publisher=[[Microsoft]] |work=MSDN Magazine |
===Windows Vista===
[[File:WindowsVistaPreliminaryWSR.PNG
At [[Windows Hardware Engineering Conference|WinHEC 2002]] Microsoft announced that Windows Vista (codenamed "Longhorn") would include advances in speech recognition and in features such as [[microphone array]] support<ref name="WinHEC2002">{{cite web |url=https://www.pcmag.com/article2/0,2817,1183143,00.asp |title=WinHEC: The Pregame Show |last=Stam |first=Nick |date=April 16, 2002 |publisher=[[Ziff Davis Media]] |work=[[PC Magazine]] |
During WinHEC 2004 Microsoft included WSR as part of a strategy to improve productivity on mobile PCs.<ref name="MobilePCs">{{cite web |url=http://download.microsoft.com/download/1/8/f/18f8cee2-0b64-41f2-893d-a6f2295b40c8/SW04023_WINHEC2004.ppt |title=Windows For Mobile PCs And Tablet PCs — CY05 And Beyond |last=Suokko |first=Matti |date=2004 |publisher=[[Microsoft]] |
During a demonstration by Microsoft on July 27, 2006—before Windows Vista's [[release to manufacturing]] (RTM)—a notable incident involving WSR occurred that resulted in an unintended output of "Dear aunt, let's set so double the killer delete select all" when several attempts to dictate led to consecutive output errors;<ref name="GoodDemos">{{cite web |url=http://blogs.reuters.com/blog/archives/1991 |title=Updated – When good demos go (very, very) bad |last=Auchard |first=Eric |date=July 28, 2006 |publisher=[[Thomson Reuters]] |
Reports from early 2007 indicated that WSR is vulnerable to attackers using speech recognition for malicious operations by playing certain audio commands through a target's speakers;<ref name="SpeechRecognitionHole">{{cite web |url=http://news.bbc.co.uk/2/hi/technology/6320865.stm |title=Vista has speech recognition hole |date=February 1, 2007 |publisher=[[British Broadcasting Corporation|BBC]] |work=[[BBC News]] |
===Windows 7===
[[File:DictationScratchpad.png|thumb|200px|The dictation scratchpad in Windows 7 replaces the "enable dictation everywhere" option of Windows Vista.]]
WSR was updated to use [[Microsoft UI Automation]] and its engine now uses the [[Technical features new to Windows Vista#Audio stack architecture|WASAPI]] audio stack, substantially enhancing its performance and enabling support for [[echo suppression and cancellation|echo cancellation]], respectively. The document harvester, which can analyze and collect text in email and documents to contextualize user terms has improved performance, and now runs periodically in the background instead of only after recognizer startup. Sleep mode has also seen performance improvements and, to address security issues, the recognizer is turned off by default after users speak "stop listening" instead of being suspended. Windows 7 also introduces an option to submit speech training data to Microsoft to improve future recognizer versions.<ref name="SRWindows7">{{cite web |url=http://blogs.msdn.com/b/tsfaware/archive/2009/01/29/what-s-new-in-windows-speech-recognition.aspx |title=What's new in Windows Speech Recognition? |last=Brown |first=Eric |date=January 29, 2009 |publisher=[[Microsoft]] |work=[[Microsoft Developer Network|MSDN]] |
A new dictation scratchpad interface functions as a temporary document into which users can dictate or type text for insertion into applications that are not compatible with the [[Text Services Framework]].<ref name="SRWindows7"/> Windows Vista previously provided an "enable dictation everywhere option" for such applications.<ref name="DictationWSR">{{cite web |url=https://blogs.msdn.microsoft.com/speech/2007/10/24/where-does-dictation-work-in-windows-speech-recognition/ |title=Where does dictation work in Windows Speech Recognition? |last=Brown |first=Eric |date=October 24, 2007 |publisher=[[Microsoft]] |work=[[Microsoft Developer Network|MSDN]] |
===Windows 8.x and Windows RT===
WSR can be used to control the [[Metro (design language)|Metro]] user interface in Windows 8, Windows 8.1, and Windows RT with commands to open the [[Windows shell#Charms|Charms bar]] ("Press Windows C"); to dictate or display commands in [[Universal Windows Platform apps#In Windows 8.x|Metro-style apps]] ("Press Windows Z"); to perform tasks in apps (e.g., "Change to Celsius" in [[MSN#Weather|MSN Weather]]); and to display all installed apps listed by the [[Start menu#Third version|Start screen]] ("Apps").<ref name="Windows8SR">{{cite web |url=http://windows.microsoft.com//en-US//windows-8//using-speech-recognition |title=How to use Speech Recognition |publisher=[[Microsoft]] |work=Windows Support |
===Windows 10===
WSR is featured in the [[Settings (Windows)|Settings]] application starting with the Windows 10 April 2018 Update ([[Windows 10 version history|Version 1803]]); the change first appeared in [[Windows Insider|Insider]] Preview Build 17083.<ref name="WSRInsider">{{cite web |url=https://blogs.windows.com/windowsexperience/2018/01/24/announcing-windows-10-insider-preview-build-17083-for-pc/ |title=Announcing Windows 10 Insider Preview Build 17083 for PC |last=Sarkar |first=Dona |date=January 24, 2018 |publisher=[[Microsoft]] |work=Windows Blogs |
===Windows 11===
In Windows 11 version 22H2, a second Microsoft app, Voice Access, was added in addition to WSR.<ref>{{Cite web |title=Set up voice access - Microsoft Support |url=https://support.microsoft.com/en-us/topic/set-up-voice-access-9fc44e29-12bf-4d86-bc4e-e9bb69df9a0e |access-date=2022-12-10 |website=support.microsoft.com}}</ref><ref>{{Cite web |last=Hachman |first=Mark |title=New Windows 11 build tests Voice Access, Spotlight backgrounds |url=https://www.pcworld.com/article/558293/new-windows-11-build-tests-voice-access-spotlight-backgrounds-feature.html |access-date=2022-12-10 |website=PCWorld |language=en}}</ref> In December 2023 Microsoft announced that WSR is deprecated in favor of Voice Access and may be removed in a future build or release of Windows.<ref name="DeprecatedFeatures">{{cite web |url=https://learn.microsoft.com/en-us/windows/whats-new/deprecated-features |title=Deprecated features in the Windows client - What's new in Windows |author=[[Microsoft]] |access-date=December 7, 2023}}</ref>
==Overview and features==
WSR allows a user to control applications and the Windows [[desktop metaphor|desktop]] [[user interface]]
When
WSR is a locally processed speech recognition platform; it does not rely on cloud computing for accuracy, dictation, or recognition.<ref name="MicrosoftPrivacyStatement">{{cite web |url=https://privacy.microsoft.com/en-us/privacystatement |title=Microsoft Privacy Statement |publisher=[[Microsoft]] |
===Interface===
Line 53 ⟶ 60:
* '''Sleeping''': The recognizer will not listen for or respond to commands other than "Start listening"
* '''Off''': The recognizer will not listen or respond to any commands; this mode can be enabled by speaking "Stop listening"
Colors of the recognizer listening mode button denote its various modes of operation: blue when listening; blue-gray when sleeping; gray when turned off; and yellow when the user switches context (e.g., from the desktop to the taskbar) or when a voice command is misinterpreted. The status area can also display custom user information as part of [[Windows Speech Recognition#Macros|Windows Speech Recognition Macros]].<ref name="WSRMacrosPreview">{{cite web |url=http://kurtsh.com/2008/04/29/beta-windows-speech-recognition-macros-technology-preview/ |title=BETA: 'Windows Speech Recognition Macros' Technology Preview |last=Shintaku |first=Kurt |date=April 29, 2008 |
[[File:WSR-AlternatesPanel.png|thumb|200px|right|The alternates panel displaying suggestions for a phrase.]]
====Alternates panel====
An alternates panel disambiguation interface lists items interpreted as being relevant to a user's spoken word(s); if the word or phrase that a user desired to insert into an application is listed among results, a user can speak the corresponding number of the word or phrase in the results and confirm this choice by speaking "OK" to insert it within the application.<ref name="Modes">{{cite web |url=http://blogs.msdn.com/b/robch/archive/2007/11/19/speech-macros-typing-mode-and-spelling-mode-in-windows-speech-recognition.aspx |title=Speech Macros, Typing Mode and Spelling Mode in Windows Speech Recognition |last=Chambers |first=Rob |date=November 19, 2007 |publisher=[[Microsoft]] |work=[[Microsoft Developer Network|MSDN]] |
===Common commands===
Line 66 ⟶ 73:
:: '''Keyboard shortcuts:''' "Press ''keyboard key''"; "Press ''{{Keypress|SHIFT}}'' plus ''{{Keypress|a}}''"; "Press capital ''{{Keypress|b}}''."
:: Keys that can be pressed without first giving the press command include: {{Keypress|Backspace}}, {{Keypress|Delete}}, {{Keypress|End}}, {{Keypress|Enter}}, {{Keypress|Home}}, {{Keypress|Page Down}}, {{Keypress|Page Up}}, and {{Keypress|Tab}}.<ref name="CommonCommands"/>
:: '''Mouse commands:''' "Click"; "Click ''that''"; "Double-click"; "Double-click ''that''"; "Mark"; "Mark ''that''"; "Right-click"; "Right-click ''that''"; "[[Windows Speech Recognition#Mousegrid|
:: '''Window management commands:''' "Close (alternatively maximize, minimize, or restore) window"; "Close ''that''"; "Close ''name of open application''"; "Switch applications"; "Switch to ''name of open application''"; "Scroll ''direction''"; "Scroll ''direction'' in ''number of pages''"; "Show desktop"; "[[Windows Speech Recognition#Show numbers|Show
: '''Speech recognition commands:''' "Start listening"; "Stop listening"; "Show speech options"; "Open speech dictionary"; "Move speech recognition"; "Minimize speech recognition"; "Restore speech recognition".<ref name="CommonCommands"/> In the English language, applicable commands can be shown by speaking "What can I say?"<ref name="SpeechRecognition"/> Users can also query the recognizer about tasks in Windows by speaking "How do I ''task name''" (e.g., "How do I install a printer?") which opens related help documentation.<ref name="General Commands">{{cite web |url=https://blogs.msdn.microsoft.com/robch/2007/03/12/windows-speech-recognition-general-commands/ |title=Windows Speech Recognition: General commands |last=Chambers |first=Rob |date=March 12, 2007 |publisher=[[Microsoft]] |work=[[Microsoft Developer Network|MSDN]] |
[[File:Mousegrid.png|thumb|160px|right|The MouseGrid command displaying a grid of numbers on the Windows Vista desktop.]]
====''MouseGrid''====
''MouseGrid'' enables users to control the mouse cursor by overlaying numbers across nine regions on the screen; these regions gradually narrow as a user speaks the number(s) of the region on which to focus until the desired interface element is reached. Users can then issue commands including "Click ''number of region''," which moves the mouse cursor to the desired region and then clicks it; and "Mark ''number of region''", which allows an item (such as a [[icon (computing)|computer icon]]) in a region to be selected, which can then be clicked with the previous ''click'' command. Users also can interact with multiple regions
====''Show Numbers''====
Applications and interface elements that do not present identifiable commands can still be controlled by asking the system to overlay numbers on top of them through a ''
[[File:Show numbers.png|thumb|160px|left|The
===Dictation===
WSR enables dictation of text in
Multiple words in a sentence can be corrected simultaneously (for example, if a user speaks "dictating" but the recognizer interprets this word as "the thing," a user can state "correct the thing" to correct both words at once). In the English language over 100,000 words are recognized by default.<ref name="CustomizedVocabularies"/> ====Speech dictionary====
Line 84 ⟶ 93:
===Macros===
[[File:WSRMacroOptions.png|thumb|160px|left|An Aero Wizard interface displaying options to create speech recognition macros.]]
WSR supports custom macros through a supplementary application by Microsoft that enables additional [[natural language processing|natural language]] commands.<ref name="WSRM">{{cite web |url=http://www.microsoft.com/en-us/download/details.aspx?id=13045 |title=Windows Speech Recognition Macros |
{| class="wikitable mw-collapsible" style="margin-left: auto; margin-right: auto; border: none; font-size:80%; text-align: center;"
|-
! scope="col" | Application or item
! scope="col" colspan="8" | Sample macro phrases (''italics'' indicate substitutable words)
|-
| '''Microsoft Outlook''' || Send email
| Send email to
| Send email to ''Makoto''
| Send email to ''Makoto Yamagishi''
| Send email to ''Makoto Yamagishi about''
| Send email to ''Makoto Yamagishi about This week's meeting''
| Refresh Outlook email contacts
|-
| '''Microsoft PowerPoint''' || Next slide
| Previous slide
| Next
| Previous
| Go forward ''5'' slides
| Go back ''3'' slides
| Go to slide ''8''
|-
| '''Windows Media Player''' || Next track
| Previous song
| Play ''Beethoven''
| Play something by ''Mozart''
| Play the CD that has ''In the Hall of the Mountain King''
| Play something written in ''1930''
| Pause music
|-
| '''Microphones in Windows''' || Microphone
| Switch microphone
| ''Microphone Array'' microphone
| Switch to ''Line''
| Switch to ''Microphone Array''
| Switch to ''Line'' microphone
| Switch to ''Microphone Array'' microphone
|-
| '''Volume levels in Windows''' || Mute the speakers
| Unmute the speakers
| Turn off the audio
| Increase the volume
| Increase the volume by 2 times
| Decrease the volume by ''50''
| Set the volume to ''66''
|-
| ''' WSR Speech Dictionary''' || Export the speech dictionary
| Add a pronunciation
| Add that [''selected text''] to the speech dictionary
| Block that [''selected text''] from the speech dictionary
| Remove that [''selected text'']
| [''Selected text''] sounds like...
| What does that [''selected text''] sound like?
|-
| '''Speech Synthesis''' || Read that [''selected text'']
| Read the next 3 paragraphs
| Read the previous sentence
| Please stop reading
| What time is it?
| What's today's date?
| Tell me the weather forecast for ''Redmond''
|-
|}
Users and developers can create their own macros based on text transcription and substitution; application execution (with support for [[command-line interface#arguments|command-line arguments]]); keyboard shortcuts; emulation of existing voice commands; or a combination of these items. [[
For a macro to load, it must be stored in a ''Speech Macros'' folder within the active user's ''[[My Documents|Documents]]'' directory. All macros are [[digital signature|digitally signed]] by default if a [[public key certificate|user certificate]] is available to ensure that stored commands are not altered or loaded by third-parties; if a certificate is not available, an administrator can create one.<ref name="WSRMacros">{{cite web |url=http://download.microsoft.com/download/F/6/B/F6B71555-D73F-4273-9217-7D872D59BE31/Windows%20Speech%20Recognition%20Macros%20Release%20Notes.docx |title=Windows Speech Recognition Macros Release Notes |
==Performance==
{{As of|2017}} WSR uses Microsoft Speech Recognizer 8.0, the version introduced in Windows Vista. For dictation it was found to be 93.6% accurate without training by Mark Hachman, a Senior Editor of ''[[PC World]]''—a rate that is not as accurate as competing software. According to Microsoft, the rate of accuracy when trained is 99%. Hachman opined that Microsoft does not publicly discuss the feature because of the 2006 incident during the development of Windows Vista, with the result being that few users knew that documents could be dictated within Windows before the introduction of [[Cortana (virtual assistant)|Cortana]].<ref name="MSR8"/>
==See also==
* [[Braina]]
* [[List of speech recognition software]]
* [[Microsoft Cordless Phone System]]
|