Voice computing: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 10:42, 10 January 2025 edit Ecangola (talk \| contribs) Extended confirmed users 97,740 edits →Legal considerations: fmt Tag: Visual edit ← Previous edit		Latest revision as of 22:22, 28 August 2025 edit undo InternetArchiveBot (talk \| contribs) Bots, Pending changes reviewers 5,697,418 edits Rescuing 3 sources and tagging 0 as dead.) #IABot (v2.0.9.5
(5 intermediate revisions by 2 users not shown)
Line 72: \|- ! [[SoX]] \| for manipulating audio files and removing environmental noise.<ref>SoX. ~~http~~https://sox.sourceforge.net/</ref> \|- ! [[Natural Language Toolkit]] Line 97: ==Applications== Voice computing applications span many industries including voice assistants, healthcare, e-Commerce, finance, supply chain, agriculture, text-to-speech, security, marketing, customer support, recruiting, cloud computing, microphones, speakers, and podcasting. Voice technology is projected to grow at a CAGR of 19-25% by 2025, making it an attractive industry for startups and investors alike.<ref>{{Cite news \|title=Global Speech and Voice Recognition Market 2018 Forecast to 2025 - CAGR Expected to Grow at 25.7% - ResearchAndMarkets.com \|url=https://www.businesswire.com/news/home/20180417006122/en/Global-Speech-Voice-Recognition-Market-2018-Forecast \|archive-url=~~http~~https://web.archive.org/web/20240119171935/https://www.businesswire.com/news/home/20180417006122/en/Global-Speech-Voice-Recognition-Market-2018-Forecast \|archive-date=2024-01-19 \|access-date=2025-01-10 \|language=en \|url-status=live }}</ref> ==Legal considerations== Line 111: * Interspeech <ref>Interspeech 2018. http://interspeech2018.org/</ref> * AVEC <ref>{{Cite web \|title=14th International Symposium on Advanced Vehicle Control - Speakers, Sessions, Agenda \|url=https://www.eventyco.com/event/14th-international-symposium-on-advanced-vehicle-control \|access-date=2025-01-10 \|website=www.eventyco.com}}</ref> * IEEE Int'l Conf. on Automatic Face and Gesture Recognition <ref>2018 FG. https://fg2018.cse.sc.edu/ {{Webarchive\|url=https://web.archive.org/web/20180511185841/https://fg2018.cse.sc.edu/ \|date=2018-05-11 }}</ref> * ACII2019 The 8th Int'l Conf. on Affective Computing and Intelligent Interaction <ref>ASCII 2019. http://acii-conf.org/2019/</ref> ==Developer community== Google Assistant has roughly 2,000 actions as of January 2018.<ref>~~Voicebot~~{{Cite web \|last=Mutchler \|first=Ava \|date=2018-01-24 \|title=Google Assistant App Total Reaches Nearly 2400.ai But That’s Not the Real Number. It’s really 1719. \|url=https://voicebot.ai/2018/01/24/google-assistant-app-total-reaches-nearly-2400-thats-not-real-number-really-1719/ \|access-date=2025-01-10 \|website=Voicebot.ai \|language=en-US}}</ref> There are over 50,000 Alexa skills worldwide as of September 2018.<ref>~~Voicebot.ai.~~{{Cite web \|last=Kinsella \|first=Bret \|date=2018-09-02 \|title=Amazon Alexa Now Has 50,000 Skills Worldwide, works with 20,000 Devices, Used by 3,500 Brands \|url=https://voicebot.ai/2018/09/02/amazon-alexa-now-has-50000-skills-worldwide-is-on-20000-devices-used-by-3500-brands/ \|access-date=2025-01-10 \|website=Voicebot.ai \|language=en-US}}</ref> In June 2017, [[Google]] released AudioSet,<ref>Google AudioSet. https://research.google.com/audioset/</ref> a large-scale collection of human-labeled 10-second sound clips drawn from YouTube videos. It contains 1,010,480 videos of human speech files, or 2,793.5 hours in total.<ref>~~Audioset~~{{Cite ~~data.~~web \|title=AudioSet \|url=https://research.google.com/audioset/dataset/speech.html \|access-date=2025-01-10 \|website=research.google.com}}</ref> It was released as part of the IEEE ICASSP 2017 Conference.<ref>Gemmeke, J. F., Ellis, D. P., Freedman, D., Jansen, A., Lawrence, W., Moore, & Ritter, M. (2017, March). Audio set: An ontology and human-labeled dataset for audio events. In Acoustics, Speech and Signal Processing (ICASSP), 2017 IEEE International Conference on (pp. 776-780). IEEE.</ref> In November 2017, [[Mozilla Foundation]] released the Common Voice Project, a collection of speech files to help contribute to the larger open source machine learning community.<ref>Common Voice Project. https://voice.mozilla.org/ {{Webarchive\|url=https://web.archive.org/web/20200227020208/https://voice.mozilla.org/ \|date=2020-02-27 }}</ref><ref>~~Common~~{{Cite web \|title=Announcing the Initial Release of Mozilla’s Open Source Speech Recognition Model and Voice ~~Project.~~Dataset {{!}} The Mozilla Blog \|url=https://blog.mozilla.org/~~blog~~en/~~2017/11/29~~mozilla/announcing-the-initial-release-of-mozillas-open-source-speech-recognition-model-and-voice-dataset/ \|access-date=2025-01-10 \|website=blog.mozilla.org \|language=en-US}}</ref> The voicebank is currently 12GB in size, with more than 500 hours of English-language voice data that have been collected from 112 countries since the project's inception in June 2017.<ref>Mozilla's large repository of voice data will shape the future of machine learning. https://opensource.com/article/18/4/common-voice</ref> This dataset has already resulted in creative projects like the DeepSpeech model, an open source transcription model.<ref>DeepSpeech. https://github.com/mozilla/DeepSpeech</ref> ==See also==