Content deleted Content added
Rescuing 3 sources and tagging 0 as dead.) #IABot (v2.0.9.5 |
|||
(45 intermediate revisions by 23 users not shown) | |||
Line 1:
{{short description|Discipline in computing}}
[[Image:Amazon Echo Plus 02.jpg|thumb|The [[Amazon
'''Voice computing''' is the discipline that develops hardware or software to process voice inputs.
It spans many other fields including [[human-computer interaction]], [[conversational computing]], [[linguistics]], [[natural language processing]], [[automatic speech recognition]], [[speech synthesis]], [[audio engineering]], [[digital signal processing]], [[cloud computing]], [[data science]], [[ethics]], [[law]], and [[information security]].
Voice computing has become increasingly significant in modern times, especially with the advent of [[smart speakers]] like the [[Amazon Echo]] and [[Google Assistant]], a shift towards [[serverless computing]], and improved accuracy of [[speech recognition]] and [[text-to-speech]] models.
==History==
Voice computing has a rich history.
{| class="wikitable"
Line 39 ⟶ 40:
|-
| 2011
| [[Apple Inc.|Apple]] releases Siri on iPhone
|-
| 2014
| [[Amazon (company)|Amazon]] releases [[Amazon Echo]] to make voice computing relevant to the public at large.
|}
Around 2011, [[Siri]] emerged on Apple iPhones as the first voice assistant accessible to consumers. This innovation led to a dramatic shift to building voice-first computing architectures. [[PS4]] was released by Sony in North America in 2013 (70+ million devices), Amazon released the [[Amazon Echo]] in 2014 (30+ million devices), [[Microsoft]] released Cortana (2015 - 400 million Windows 10 users), Google released [[Google Assistant]] (2016 - 2 billion active monthly users on Android phones), and [[Apple Inc.|Apple]] released [[HomePod]] (2018 - 500,000 devices sold and 1 billion devices active with iOS/Siri). These shifts, along with advancements in cloud infrastructure (e.g. [[Amazon Web Services]]) and [[codecs]], have solidified the voice computing field and made it widely relevant to the public at large.
==Hardware==
A
Note that voice computers do not necessarily need a screen, such as in the traditional [[Amazon Echo]]. In other embodiments, traditional [[laptop computers]] or [[mobile phones]] could be used
As of September 2018, there are currently over 20,000 types of devices compatible with Amazon Alexa.
==Software==
Here are some popular software packages related to voice computing:
* <strong>[[FFmpeg]]</strong> - for [[transcoding]] audio files from one format to another (e.g. .WAV --> .MP3). <ref>FFmpeg. https://www.ffmpeg.org/</ref>▼
* <strong>[[Audacity]]</strong> - for recording and filtering audio. <ref>Audacity. https://www.audacityteam.org/</ref>▼
* <strong>[[SoX]]</strong> - for manipulating audio files and removing environmental noise. <ref>SoX. http://sox.sourceforge.net/</ref>▼
* <strong>Natural Language ToolKit</strong> - for featurizing transcripts with things like [[parts of speech]]. <ref>NLTK. https://www.nltk.org/</ref>▼
* <strong>LibROSA</strong> - for visualizing audio file spectrograms and featurizing audio files. <ref>LibROSA. https://librosa.github.io/librosa/</ref>▼
* <strong>[[OpenSMILE]]</strong> - for featurizing audio files with things like mel-frequency cepstrum coefficients. <ref>OpenSMILE. https://www.audeering.com/technology/opensmile/</ref>▼
* <strong>Pyttsx3</strong> - for playing back audio files (text-to-speech). <ref>Pyttsx3. https://github.com/nateshmbhat/pyttsx3</ref>▼
* <strong>Pycryptodome</strong> - for encrypting and decrypting audio files. <ref>Pycryptodome. https://pycryptodome.readthedocs.io/en/latest/</ref>▼
==Applications==▼
Voice computing applications span many industries including voice assistants, healthcare, e-Commerce, finance, supply chain, agriculture, text-to-speech, security, marketing, customer support, recruiting, cloud computing, microphones, speakers, and podcasting. Voice technology is projected to grow at a CAGR of 19-25% by 2025, making it an attractive industry for startups and investors alike. <ref>Businesswire. https://www.businesswire.com/news/home/20180417006122/en/Global-Speech-Voice-Recognition-Market-2018-Forecast</ref>▼
{| class="wikitable"
|-
!
! Description
|-
▲
|-
! [[Audacity (audio editor)|Audacity]]
▲
|-
▲
|-
! [[Natural Language Toolkit]]
▲
|-
! LibROSA
▲
|-
▲
|-
| for transcribing speech files into text.<ref>{{Cite web | url=https://github.com/cmusphinx/pocketsphinx |title = PocketSphinx is a lightweight speech recognition engine, specifically tuned for handheld and mobile devices, though it works equally well on the desktop: Cmusphinx/Pocketsphinx|website = [[GitHub]]|date = 29 March 2020}}</ref>
|-
! Pyttsx3
▲
|-
! Pycryptodome
▲
|-
! AudioFlux
|
|}
▲==Applications==
▲Voice computing applications span many industries including voice assistants, healthcare, e-Commerce, finance, supply chain, agriculture, text-to-speech, security, marketing, customer support, recruiting, cloud computing, microphones, speakers, and podcasting. Voice technology is projected to grow at a CAGR of 19-25% by 2025, making it an attractive industry for startups and investors alike.
==Legal considerations==
In the United States,
Moreover, [[COPPA]] is a significant law to protect minors
Lastly, [[GDPR]] is a new European law that governs the [[right to be forgotten]] and many other clauses for EU citizens. GDPR also is clear that companies need to outline clear measures to obtain consent if audio recordings are made and define the purpose and scope as to how these recordings will be used,
▲==Research Conferences==
There are many research conferences that relate to voice computing. Some of these include:
* [[International Conference on Acoustics, Speech, and Signal Processing]]
* Interspeech <ref>Interspeech 2018. http://interspeech2018.org/</ref>
* AVEC <ref>{{Cite web |title=14th International Symposium on Advanced Vehicle Control - Speakers, Sessions, Agenda |url=https://www.eventyco.com/event/14th-international-symposium-on-advanced-vehicle-control |access-date=2025-01-10 |website=www.eventyco.com}}</ref>
* IEEE Int'l Conf. on Automatic Face and Gesture Recognition <ref>2018 FG. https://fg2018.cse.sc.edu/ {{Webarchive|url=https://web.archive.org/web/20180511185841/https://fg2018.cse.sc.edu/ |date=2018-05-11 }}</ref>
* ACII2019 The 8th Int'l Conf. on Affective Computing and Intelligent Interaction <ref>ASCII 2019. http://acii-conf.org/2019/</ref>
==Developer community==
Google Assistant has roughly 2,000 actions as of January 2018.
There are over 50,000 Alexa skills worldwide as of September 2018.
In June 2017, [[Google]] released AudioSet,<ref>Google AudioSet. https://research.google.com/audioset/</ref> a large-scale collection of human-labeled 10-second sound clips drawn from YouTube videos. It contains 1,010,480
In November 2017, [[Mozilla Foundation]] released the Common Voice Project, a collection of speech files to help contribute to the larger open source machine learning community.<ref>Common Voice Project. https://voice.mozilla.org/ {{Webarchive|url=https://web.archive.org/web/20200227020208/https://voice.mozilla.org/ |date=2020-02-27 }}</ref><ref>
==See also==
*[[Speech
*[[Natural
*[[Voice user interface]]
*[[Audio codec]]
*[[
*[[Hands-free computing]]
==References==
{{Reflist}}
[[Category:Speech recognition]]
Line 167 ⟶ 139:
[[Category:Natural language processing]]
[[Category:Computational linguistics]]
[[Category:Computational fields of study]]
|