HTML audio

HTML5 Audio is a subject of the HTML5 draft specification, investigating audio input, playback, synthesis, as well as speech to text in the browser.

<Audio> element

This table documents the current support for audio codecs by the <audio> element.

Browser	Operating system	Formats supported by different web browsers
Browser	Operating system	Ogg Vorbis	WAV PCM	MP3	AAC	WebM
Google Chrome	All supported	Yes	Yes	Yes	Yes	Yes
Internet Explorer	Windows	Depends	No	9.0	9.0	No
Mozilla Firefox	All supported	1.9.1	1.9.1	No	No	2.0
Opera	All supported	10.50	No	Depends	Depends	10.60
Safari	OS X	Depends	3.1	3.1	3.1	Depends

Web Audio APIs

The Web Audio API specification developed by Google describes a high-level JavaScript API for processing and synthesizing audio in web applications. The primary paradigm is of an audio routing graph, where a number of AudioNode objects are connected together to define the overall audio rendering. The actual processing will primarily take place in the underlying implementation (typically optimized Assembly / C / C++ code), but direct JavaScript processing and synthesis is also supported.^[1] Google's Chrome browser implements this API since version 14, released in 2011.^[2]

Mozilla's Firefox browser implements a similar Audio Data API extension since version 4, released in 2011, but Mozilla warns it is non-standard and deprecated.^[3] Some JavaScript audio processing and synthesis libraries such as Audiolet support both APIs.

The W3C Audio Working Group is also considering the MediaStream Processing API specification developed by Mozilla.^[4] In addition to audio mixing and processing, it covers more general media streaming, including synchronization with HTML elements, capture of audio and video streams, and peer-to-peer routing of such media streams.^[5]

Speech API

The Speech API aims to provide an alternative input method for web applications (without using a keyboard). With this API, developers can give web apps the ability to transcribe your voice to text, from your computer's microphone. The recorded audio is sent to speech servers for transcription, after which the text is typed out for you. The API itself is agnostic of the underlying speech recognition implementation and can support both server based as well as embedded recognizers. ^[6] The HTML Speech Incubator group has proposed the implementation of audio-speech technology in browsers in the form of uniform, cross-platform APIs. The API contains both:^[7]

Speech Input API
Text to Speech API

Google integrated this feature into Google Chrome on March 2011.^[8] Letting its users search the web with their voice with code like:

 <script type="text/javascript">
     function startSearch(event) {
       event.target.form.submit();
     }
   </script>
   <form action="http://www.google.com/search">
   <input type="search" name="q" speech required onspeechchange="startSearch">
   </form>

Competition

The adoption of HTML5 audio, as with HTML5 video, has become polarised between proponents of free and patented formats.

Apple and Microsoft, which between them account for around 39% of the browser market, support the industry standard, ISO defined formats of AAC and the older MP3. They cite superior performance, and the risk of a submarine patent attack from formats which are believed, but not guaranteed, to be 'free'.

Mozilla and Opera, controlling 24% of the market, support the free and open Vorbis and WebM formats, and criticise the patent-encumbered nature of AAC. The proprietary nature of the Vorbis format - it is controlled by Xiph.org - has also been been criticised, however. In 2007, the recommendation to use Vorbis was retracted by the W3C, citing risks over unknown patents.

Google, controlling 27% of the market, has so far provided support for all common formats.

The result is that for a website to guarantee HTML5 audio for all users, it has to make two formats available, often Vorbis, as used on Wikipedia, and AAC.

References

^ Chris Rogers (2012-03-15). "Web Audio API". W3C. Archived from the original on 2012-03-15. Retrieved 2012-07-04.
^ Scott Gilbertson (2011-09-19). "Chrome 14 Adds Better Audio, 'Native Client' Support". Webmonkey. Wired. Retrieved 2012-07-04.
^ "Introducing the Audio API extension". Mozilla Developer Network. Mozilla. 2012-03-05. Archived from the original on 2012-03-05. Retrieved 2012-07-04.
^ "Audio Processing API". W3C. 2011-12-15. Archived from the original on 2011-12-15. Retrieved 2012-07-04.
^ Robert O'Callahan (2012-05-31). "MediaStream Processing API". W3C. Retrieved 2012-07-04.
^ "API draft". Retrieved January 28, 2012.
^ "HTML5 Speech API". Retrieved January 28, 2012.
^ "Talking to your computer". Retrieved January 28, 2012.

[1] Chris Rogers (2012-03-15). "Web Audio API". W3C. Archived from the original on 2012-03-15. Retrieved 2012-07-04.

[2] Scott Gilbertson (2011-09-19). "Chrome 14 Adds Better Audio, 'Native Client' Support". Webmonkey. Wired. Retrieved 2012-07-04.

[3] "Introducing the Audio API extension". Mozilla Developer Network. Mozilla. 2012-03-05. Archived from the original on 2012-03-05. Retrieved 2012-07-04.

[4] "Audio Processing API". W3C. 2011-12-15. Archived from the original on 2011-12-15. Retrieved 2012-07-04.

[5] Robert O'Callahan (2012-05-31). "MediaStream Processing API". W3C. Retrieved 2012-07-04.

[6] "API draft". Retrieved January 28, 2012.

[7] "HTML5 Speech API". Retrieved January 28, 2012.

[8] "Talking to your computer". Retrieved January 28, 2012.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]