Content deleted Content added
Reformat software list as table, remove HTML tags. Move Amazon services into Applications table instead of Software. Add reference & exact date to COPPA rule change. |
Rescuing 3 sources and tagging 0 as dead.) #IABot (v2.0.9.5 |
||
(22 intermediate revisions by 11 users not shown) | |||
Line 1:
{{short description|Discipline in computing}}
[[Image:Amazon Echo Plus 02.jpg|thumb|The [[Amazon Echo]], an example of a voice computer]]
'''Voice computing''' is the discipline that develops hardware or software to process voice inputs.<ref>Schwoebel, J. (2018). An Introduction to Voice Computing in Python. Boston; Seattle, Atlanta: NeuroLex Laboratories. https://neurolex.ai/voicebook</ref>
Line 8:
==History==
Voice computing has a rich history.<ref>
{| class="wikitable"
Line 53:
Note that voice computers do not necessarily need a screen, such as in the traditional [[Amazon Echo]]. In other embodiments, traditional [[laptop computers]] or [[mobile phones]] could be used as voice computers. Moreover, there has become increasingly more interfaces for voice computers with the advent of [[Internet of things|IoT]]-enabled devices, such as within cars or televisions.
As of September 2018, there are currently over 20,000 types of devices compatible with Amazon Alexa.<ref>
==Software==
Line 62:
{| class="wikitable"
|-
! Package
! Description
|-
Line 72:
|-
! [[SoX]]
| for manipulating audio files and removing environmental noise.<ref>SoX.
|-
! [[Natural Language Toolkit]]
Line 84:
|-
! [[CMU Sphinx]]
| for transcribing speech files into text.<ref>{{Cite web | url=https://github.com/cmusphinx/pocketsphinx |title = PocketSphinx is a lightweight speech recognition engine, specifically tuned for handheld and mobile devices, though it works equally well on the desktop: Cmusphinx/Pocketsphinx|website = [[GitHub]]|date = 29 March 2020}}</ref>
|-
! Pyttsx3
Line 91:
! Pycryptodome
| for encrypting and decrypting audio files.<ref>Pycryptodome. https://pycryptodome.readthedocs.io/en/latest/</ref>
|-▼
! AudioFlux
| for audio and music analysis, feature extraction.<ref>AudioFlux. https://github.com/libAudioFlux/audioFlux/</ref>
|}
==Applications==
Voice computing applications span many industries including voice assistants, healthcare, e-Commerce, finance, supply chain, agriculture, text-to-speech, security, marketing, customer support, recruiting, cloud computing, microphones, speakers, and podcasting. Voice technology is projected to grow at a CAGR of 19-25% by 2025, making it an attractive industry for startups and investors alike.<ref>
▲|-
==Legal considerations==
In the United States, the states have varying [[telephone call recording laws]]. In some states, it is legal to record a conversation with the consent of only one party, in others the consent of all parties is required.
Moreover, [[COPPA]] is a significant law to protect minors using the Internet. With an increasing number of minors interacting with voice computing devices (e.g. the Amazon Alexa), on October 23, 2017 the [[Federal Trade Commission]] relaxed the COPAA rule so that children can issue voice searches and commands.<ref>
Lastly, [[GDPR]] is a new European law that governs the [[right to be forgotten]] and many other clauses for EU citizens. GDPR also is clear that companies need to outline clear measures to obtain consent if audio recordings are made and define the purpose and scope as to how these recordings will be used, e.g., for training purposes. The bar for valid consent has been raised under the GDPR. Consents must be freely given, specific, informed, and unambiguous; tacit consent is no longer sufficient.<ref>IAPP. https://iapp.org/news/a/how-do-the-rules-on-audio-recording-change-under-the-gdpr/</ref>
Line 158 ⟶ 110:
* [[International Conference on Acoustics, Speech, and Signal Processing]]
* Interspeech <ref>Interspeech 2018. http://interspeech2018.org/</ref>
* AVEC <ref>{{Cite web |title=14th International Symposium on Advanced Vehicle Control - Speakers, Sessions, Agenda |url=https://www.eventyco.com/event/14th-international-symposium-on-advanced-vehicle-control |access-date=2025-01-10 |website=www.eventyco.com}}</ref>
* IEEE Int'l Conf. on Automatic Face and Gesture Recognition <ref>2018 FG. https://fg2018.cse.sc.edu/ {{Webarchive|url=https://web.archive.org/web/20180511185841/https://fg2018.cse.sc.edu/ |date=2018-05-11 }}</ref>
* ACII2019 The 8th Int'l Conf. on Affective Computing and Intelligent Interaction <ref>ASCII 2019. http://acii-conf.org/2019/</ref>
==Developer community==
Google Assistant has roughly 2,000 actions as of January 2018.<ref>
There are over 50,000 Alexa skills worldwide as of September 2018.<ref>
In June 2017, [[Google]] released AudioSet,<ref>Google AudioSet. https://research.google.com/audioset/</ref> a large-scale collection of human-labeled 10-second sound clips drawn from YouTube videos. It contains 1,010,480 videos of human speech files, or 2,793.5 hours in total.<ref>
In November 2017, [[Mozilla Foundation]] released the Common Voice Project, a collection of speech files to help contribute to the larger open source machine learning community.<ref>Common Voice Project. https://voice.mozilla.org/ {{Webarchive|url=https://web.archive.org/web/20200227020208/https://voice.mozilla.org/ |date=2020-02-27 }}</ref><ref>
==See also==
*[[Speech recognition]]
*[[Natural
*[[Voice user interface]]
*[[Audio codec]]
Line 188 ⟶ 140:
[[Category:Computational linguistics]]
[[Category:Computational fields of study]]
|