Content deleted Content added
→Developer community: fmt |
Rescuing 3 sources and tagging 0 as dead.) #IABot (v2.0.9.5 |
||
(One intermediate revision by one other user not shown) | |||
Line 72:
|-
! [[SoX]]
| for manipulating audio files and removing environmental noise.<ref>SoX.
|-
! [[Natural Language Toolkit]]
Line 97:
==Applications==
Voice computing applications span many industries including voice assistants, healthcare, e-Commerce, finance, supply chain, agriculture, text-to-speech, security, marketing, customer support, recruiting, cloud computing, microphones, speakers, and podcasting. Voice technology is projected to grow at a CAGR of 19-25% by 2025, making it an attractive industry for startups and investors alike.<ref>{{Cite news |title=Global Speech and Voice Recognition Market 2018 Forecast to 2025 - CAGR Expected to Grow at 25.7% - ResearchAndMarkets.com |url=https://www.businesswire.com/news/home/20180417006122/en/Global-Speech-Voice-Recognition-Market-2018-Forecast |archive-url=
==Legal considerations==
Line 111:
* Interspeech <ref>Interspeech 2018. http://interspeech2018.org/</ref>
* AVEC <ref>{{Cite web |title=14th International Symposium on Advanced Vehicle Control - Speakers, Sessions, Agenda |url=https://www.eventyco.com/event/14th-international-symposium-on-advanced-vehicle-control |access-date=2025-01-10 |website=www.eventyco.com}}</ref>
* IEEE Int'l Conf. on Automatic Face and Gesture Recognition <ref>2018 FG. https://fg2018.cse.sc.edu/ {{Webarchive|url=https://web.archive.org/web/20180511185841/https://fg2018.cse.sc.edu/ |date=2018-05-11 }}</ref>
* ACII2019 The 8th Int'l Conf. on Affective Computing and Intelligent Interaction <ref>ASCII 2019. http://acii-conf.org/2019/</ref>
Line 121:
In June 2017, [[Google]] released AudioSet,<ref>Google AudioSet. https://research.google.com/audioset/</ref> a large-scale collection of human-labeled 10-second sound clips drawn from YouTube videos. It contains 1,010,480 videos of human speech files, or 2,793.5 hours in total.<ref>{{Cite web |title=AudioSet |url=https://research.google.com/audioset/dataset/speech.html |access-date=2025-01-10 |website=research.google.com}}</ref> It was released as part of the IEEE ICASSP 2017 Conference.<ref>Gemmeke, J. F., Ellis, D. P., Freedman, D., Jansen, A., Lawrence, W., Moore, & Ritter, M. (2017, March). Audio set: An ontology and human-labeled dataset for audio events. In Acoustics, Speech and Signal Processing (ICASSP), 2017 IEEE International Conference on (pp. 776-780). IEEE.</ref>
In November 2017, [[Mozilla Foundation]] released the Common Voice Project, a collection of speech files to help contribute to the larger open source machine learning community.<ref>Common Voice Project. https://voice.mozilla.org/ {{Webarchive|url=https://web.archive.org/web/20200227020208/https://voice.mozilla.org/ |date=2020-02-27 }}</ref><ref>{{Cite web |title=Announcing the Initial Release of Mozilla’s Open Source Speech Recognition Model and Voice Dataset {{!}} The Mozilla Blog |url=https://blog.mozilla.org/en/mozilla/announcing-the-initial-release-of-mozillas-open-source-speech-recognition-model-and-voice-dataset/ |access-date=2025-01-10 |website=blog.mozilla.org |language=en-US}}</ref> The voicebank is currently 12GB in size, with more than 500 hours of English-language voice data that have been collected from 112 countries since the project's inception in June 2017.<ref>Mozilla's large repository of voice data will shape the future of machine learning. https://opensource.com/article/18/4/common-voice</ref> This dataset has already resulted in creative projects like the DeepSpeech model, an open source transcription model.<ref>DeepSpeech. https://github.com/mozilla/DeepSpeech</ref>
==See also==
|