Content deleted Content added
Tags: Mobile edit Mobile web edit Advanced mobile edit |
|||
(47 intermediate revisions by 25 users not shown) | |||
Line 1:
{{Short description|Application programming interface for Microsoft Windows}}
{{about|the Speech API||SAPI (disambiguation)}}▼
▲{{about|the Speech API||SAPI (disambiguation)}}
The '''Speech Application Programming Interface''' or '''SAPI''' is an [[Application programming interface|API]] developed by [[Microsoft]] to allow the use of [[speech recognition]] and [[speech synthesis]] within [[Microsoft Windows|Windows]] applications. To date, a number of versions of the API have been released, which have shipped either as part of a Speech [[Software development kit|SDK]], or as part of the Windows [[Operating System|OS]] itself. Applications that use SAPI include [[Microsoft Office]], [[Microsoft Agent]] and [[Microsoft Speech Server]].▼
▲The '''Speech Application Programming Interface''' or '''SAPI''' is an [[Application programming interface|API]] developed by [[Microsoft]] to allow the use of [[speech recognition]] and [[speech synthesis]] within [[Microsoft Windows|Windows]] applications. To date, a number of versions of the API have been released, which have shipped either as part of a Speech [[Software development kit|SDK]]
In general all versions of the API have been designed such that a software developer can write an application to perform speech recognition and synthesis by using a standard set of interfaces, accessible from a variety of programming languages. In addition, it is possible for a 3rd-party company to produce their own Speech Recognition and [[Speech Synthesis|Text-To-Speech]] engines or adapt existing engines to work with SAPI. In principle, as long as these engines conform to the defined interfaces they can be used instead of the Microsoft-supplied engines.▼
▲In general, all versions of the API have been designed such that a software developer can write an application to perform speech recognition and synthesis by using a standard set of interfaces, accessible from a variety of programming languages. In addition, it is possible for a 3rd-party company to produce their own Speech Recognition and [[Speech Synthesis|Text-To-Speech]] engines or adapt existing engines to work with SAPI. In principle, as long as these engines conform to the defined interfaces they can be used instead of the Microsoft-supplied engines.
In general the Speech API is a freely redistributable component which can be shipped with any Windows application that wishes to use speech technology. Many versions (although not all) of the speech recognition and synthesis engines are also freely redistributable.▼
▲In general, the Speech API is a freely redistributable component which can be shipped with any Windows application that wishes to use speech technology. Many versions (although not all) of the speech recognition and synthesis engines are also freely redistributable.
There have been two main 'families' of the Microsoft Speech API. SAPI versions 1 through 4 are all similar to each other, with extra features in each newer version. SAPI 5 however was a completely new interface, released in 2000. Since then several sub-versions of this API have been released.▼
▲There have been two main 'families' of the Microsoft Speech API. SAPI versions 1 through 4 are all similar to each other, with extra features in each newer version. SAPI 5, however, was a completely new interface, released in 2000. Since then several sub-versions of this API have been released.
==Basic architecture==
Line 15 ⟶ 17:
In SAPI 5 however, applications and engines do not directly communicate with each other. Instead, each talks to a [[Run-time system|runtime]] component ('''sapi.dll'''). There is an API implemented by this component which applications use, and another set of interfaces for engines.
Typically in SAPI 5 applications issue calls through the API (for example to load a recognition grammar; start recognition; or provide text to be synthesized). The sapi.dll runtime component interprets these commands and processes them, where necessary calling on the engine through the engine interfaces (for example, the loading of
In addition to the actual API definition and runtime
*''API definition files'' - in [[MIDL]] and as C or C++ header files.
*''Runtime components'' - e.g. sapi.dll.
Line 55 ⟶ 57:
The '''Speech SDK version 5.0''', incorporating the '''SAPI 5.0''' runtime was released in 2000. This was a complete redesign from previous versions and neither engines nor applications which used older versions of SAPI could use the new version without considerable modification.
The design of the new API included the concept of strictly separating the application and engine so all calls were routed through the runtime sapi.dll. This change was intended to make the API more 'engine-independent', preventing applications from inadvertently depending on features of a specific engine. In addition, this change was aimed at making it much easier to incorporate speech technology into an application by moving some management and initialization code into the runtime.
The new API was initially a pure COM API and could be used easily only from C/C++. Support for VB and scripting languages were added later. Operating systems from [[Windows 98]] and [[NT 4.0]] upwards were supported.
Line 61 ⟶ 63:
Major features of the API include:
*'''Shared Recognizer'''. For desktop speech recognition applications, a recognizer object can be used that runs in a separate process ('''sapisvr.exe'''). All applications using the shared recognizer communicate with this single instance. This allows sharing of resources, removes contention for the microphone and allows for a global UI for control of all speech applications.
*'''In-proc recognizer'''. For applications that require explicit control of the recognition process, the in-proc recognizer object can be used instead of the shared one.
*'''Grammar objects'''. Speech grammars are used to specify the words that the recognizer is listening for. SAPI 5 defines an [[XML]] markup for specifying a grammar, as well as mechanisms to create them dynamically in code. Methods also exist for instructing the recognizer to load a built-in dictation language model.
*'''Voice object'''. This performs speech synthesis, producing an audio stream from a text. A markup language (similar to XML, but not strictly XML) can be used for controlling the synthesis process.
*'''Audio interfaces'''. The runtime includes objects for performing speech input from the microphone or speech output to speakers (or any sound device); as well as to and from wave files. It is also possible to write a custom audio object to stream audio to or from a non-standard ___location.
*'''User lexicon object'''. This allows custom words and pronunciations to be added by a user or application. These are added to the recognition or synthesis engine's built-in lexicons.
*'''Object tokens'''. This is a concept allowing recognition and TTS engines, audio objects, lexicons and other categories of an object to be registered, enumerated and instantiated in a common way.
====SAPI 5.0====
Line 72 ⟶ 74:
====SAPI 5.1====
This version shipped in late 2001 as part of the Speech SDK version 5.1. Automation-compliant interfaces were added to the API to allow use from Visual Basic, scripting languages such as [[JScript]], and [[managed code]]. This version of the API and TTS engines
====SAPI 5.2====
Line 87 ⟶ 89:
* User-Specified shortcuts in lexicons, which is the ability to add a string to the lexicon and associate it with a shortcut word. When dictating, the user can say the shortcut word and the recognizer will return the expanded string.
* Additional functionality and ease-of-programming provided by new types.
* Performance improvements, improved reliability, and security.
* Version 8 of the speech recognition engine ("Microsoft Speech Recognizer")
Line 95 ⟶ 97:
===SAPI 5 Voices===
{{main|Microsoft text-to-speech voices}}
[[Microsoft
===Managed code Speech API===
Line 103 ⟶ 105:
| publisher=Redmond Developer News
| url=http://reddevnews.com/articles/2007/02/15/give-applications-a-voice.aspx?sc_lang=en
|
}}
{{webarchive |url=https://web.archive.org/web/20100114122117/http://reddevnews.com/articles/2007/02/15/give-applications-a-voice.aspx |date=14 January 2010}}</ref> It has similar functionality to SAPI 5 but is more suitable to be used by managed code applications. The new API is available on [[Windows XP]], [[Windows Server 2003]], [[Windows Vista]], and [[Windows Server 2008]].
The existing SAPI 5 API can also be used from managed code to a limited extent by creating a COM Interop code (helper code designed to assist in accessing COM interfaces and classes). This works well in some scenarios however the new API should provide a more seamless experience equivalent to using any other managed code library.
However, major obstacle towards transitioning from the COM Interop is the fact that the managed implementation has subtle [[memory leak]]s which lead to memory fragmentation and exclude the use of the library in any non-trivial applications. As a workaround, Microsoft has suggested using a different API, which has fewer voices.<ref>[http://connect.microsoft.com/VisualStudio/feedback/details/664196/system-speech-has-a-memory-leak System. Speech has a memory leak | Microsoft Connect]. Connect.microsoft.com. Retrieved on 2013-09-27.</ref>
==Speech functionality in Windows Vista==
Line 120 ⟶ 122:
* New Speech Synthesis engine and SAPI voice [[Microsoft Anna]]
* [[Managed code]] speech API (codenamed SpeechFX)
* Speech recognition support for 8 languages at release time: U.S. English, U.K. English, traditional Chinese, simplified Chinese, Japanese, Spanish, French, and German, with more language to be released later.
[[Microsoft Agent]] most notably, and all other Microsoft speech applications use SAPI 5.
Line 130 ⟶ 132:
| publisher=MSDN
| url=http://msdn.microsoft.com/library/default.asp?url=/library/en-us/SAPI51sr/html/system_requirements.asp
|
| archive-url=https://web.archive.org/web/20050822175601/http://msdn.microsoft.com/library/en-us/SAPI51sr/html/system_requirements.asp
| archive-date=2005-08-22
}}
</ref><ref name="SAPI5SysReq">{{Cite web|url=https://documentation.help/SAPI-5/system_requirements.htm|title=Welcome to the Microsoft Speech SDK - Microsoft Speech SDK Documentation|website=documentation.help|access-date=2025-06-20}}</ref>
===SAPI 5===
List as of SAPI version 5.1:<ref name="SAPI compatibility" /><ref name="SAPI5SysReq" />
*[[Microsoft Windows 10]]▼
*[[Windows Server 2003|Microsoft Windows
*[[Windows XP|Microsoft Windows
*[[Windows Me|Microsoft Windows
*[[Windows 2000|Microsoft Windows
*[[Windows 98|Microsoft Windows
*[[Windows NT 4.0|Microsoft Windows
Later versions of SAPI 5 (e.g. SAPI 5.3 and above) are compatible with the following operating systems:
*[[Windows Server|Microsoft Windows Server]] releases from [[Windows Server 2008|2008]] up to [[Windows Server 2025|2025]]
*[[Windows 11|Microsoft Windows 11]]
▲*[[Windows 10|Microsoft Windows 10]]
*[[Windows 8.1|Microsoft Windows 8.1]]
*[[Windows 8|Microsoft Windows 8]]
*[[Windows 7|Microsoft Windows 7]]
*[[Windows Vista|Microsoft Windows Vista]]
===SAPI 4===
*[[Windows Server 2003|Microsoft Windows Server 2003]] and later
*[[Microsoft Windows XP]] and later
*[[Windows Me|Microsoft Windows Millennium Edition]]
*[[Windows 2000|Microsoft Windows
*[[Windows 98|Microsoft Windows
*[[Windows NT 4.0|Microsoft Windows
*[[Windows 95|Microsoft Windows 95]]
==Major applications using SAPI==
<!-- Please only add MAJOR applications where speech input or output is a major feature.
<!-- When adding application, reference what features of SAPI are uses, for example, TTS or Speech Recognition -->
<!-- Consider splitting this list into TTS and speech
*
*[[Windows Speech Recognition]] in [[Windows Vista]] and later
*[[Microsoft Narrator]] in Windows 2000 and later Windows operating systems
*[[Microsoft Office XP]]
*[[Microsoft Excel]] 2002, Microsoft Excel 2003, and Microsoft Excel 2007 for speaking spreadsheet data
*[[Microsoft Voice Command]] for Windows Pocket PC and Windows Mobile
*[[Microsoft Plus#Microsoft Plus! for Windows XP|Microsoft Plus! Voice Command for Windows Media Player]]
*[[Adobe Reader]] uses voice output to read document content
*[[CoolSpeech]], a text-to-speech application that reads text aloud from a variety of sources
*[[Window-Eyes]] screen reader
*[[JAWS (screen reader)|JAWS]] screen reader
*[[NonVisual Desktop Access]] (NVDA), a free and open source screen reader
▲<!-- Please only add MAJOR applications where speech input or output is a major feature. SDK's and simple TTS support does not qualify -->
==See also==
* [[List of speech recognition software]]▼
* [[Comparison of speech synthesizers]]
▲* [[List of speech recognition software]]
* {{annotated link|SASDK}}
==References==▼
{{reflist}}▼
==External links==
*[https://azure.microsoft.com/en-us/blog/global-scale-ai-with-azure-cognitive-services/ Microsoft Cognitive Services Ignite 2018 event blog post]
*[https://web.archive.org/web/20071016060248/http://www.microsoft.com/speech/speech2007/default.mspx Microsoft site for SAPI]
*[http://www.microsoft.com/downloads/details.aspx?FamilyID=5e86ec97-40a7-453f-b0ee-6583171b4530 Microsoft download site for Speech API Software Developers Kit version 5.1]
*[http://www.microsoft.com/msj/archive/s233.aspx Microsoft Systems Journal Whitepaper by Mike Rozak on the first version of SAPI]
*[http://blogs.msdn.com/speech Microsoft Speech Team blog]
▲==References==
▲{{reflist}}
{{Microsoft APIs}}
{{Speech synthesis}}
[[Category: Microsoft application programming interfaces]]
[[Category:
[[Category:
|