Speech coding: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 21:47, 9 September 2024 edit Pkna22 (talk \| contribs) 36 edits Added template Tags: Mobile edit Mobile web edit ← Previous edit		Latest revision as of 22:11, 17 December 2024 edit undo Kvng (talk \| contribs) Extended confirmed users, New page reviewers 116,026 edits m avoid bit/s wrap at slash
(6 intermediate revisions by 5 users not shown)
Line 3: {{more citations needed\|date=January 2013}} '''Speech coding''' is an application of [[data compression]] to [[digital audio]] signals containing [[speech]]. Speech coding uses speech-specific [[parameter estimation]] using [[audio signal processing]] techniques to model the speech signal, combined with generic data compression algorithms to represent the resulting modeled parameters in a compact bitstream.<ref>{{cite journal\|~~first~~first1=M. \|~~last~~last1=Arjona Ramírez\|first2=M.\|last2=Minam\|title=Low bit rate speech coding\|journal=Wiley Encyclopedia of Telecommunications, J. G. Proakis, Ed.\|~~city~~___location=New York\| publisher=Wiley\|year=2003\| volume= 3\|pppages=~~1299-1308~~1299–1308}}</ref> Common applications of speech coding are [[mobile telephony]] and [[voice over IP]] (VoIP).<ref>M. Arjona Ramírez and M. Minami, "Technology and standards for low-bit-rate vocoding methods," in The Handbook of Computer Networks, H. Bidgoli, Ed., New York: Wiley, 2011, vol. 2, pp. 447–467.</ref> The most widely used speech coding technique in mobile telephony is [[linear predictive coding]] (LPC), while the most widely used in VoIP applications are the LPC and [[modified discrete cosine transform]] (MDCT) techniques.{{Citation needed\|date=December 2019}} Line 15: # Waveform coders #* Time-___domain: [[PCM]], [[ADPCM]] #* Frequency-___domain: [[sub-band coding]], [[~~Adaptive Transform Acoustic Coding\|~~ATRAC]] # [[Vocoder]]s #* [[Linear predictive coding]] (LPC) Line 22: == Sample companding viewed as a form of speech coding == The [[~~A-law algorithm\|~~A-law]] and [[μ-law algorithm]]s used in [[G.711]] PCM [[digital telephony]] can be seen as an earlier precursor of speech encoding, requiring only 8 bits per sample but giving effectively 12 [[audio bit depth\|bits of resolution]].<ref>{{cite book\|first1=N. S. \|last1=Jayant ~~and~~ \|first2=P.\|last2= Noll,\|title= Digital coding of waveforms.\|___location= Englewood Cliffs:\|publisher= Prentice-Hall, \|year=1984.}}</ref> Logarithmic companding are consistent with human hearing perception in that a low-amplitude noise is heard along a low-amplitude speech signal but is masked by a high-amplitude one. Although this would generate unacceptable distortion in a music signal, the peaky nature of speech waveforms, combined with the simple frequency structure of speech as a [[~~periodic function\|~~periodic waveform]] having a single [[fundamental frequency]] with occasional added noise bursts, make these very simple instantaneous compression algorithms acceptable for speech.{{citation needed\|date=July 2023}}{{dubious\|discuss=Logarithmic companding for music\|date=July 2023}} A wide variety of other algorithms were tried at the time, mostly [[delta modulation]] variants, but after careful consideration, the A-law/μ-law algorithms were chosen by the designers of the early digital telephony systems. At the time of their design, their 33% bandwidth reduction for a very low complexity made an excellent engineering compromise. Their audio performance remains acceptable, and there was no need to replace them in the stationary phone network.{{citation needed\|date=July 2023}} Line 35: The [[modified discrete cosine transform]] (MDCT) is used in the LD-MDCT technique used by the [[AAC-LD]] format introduced in 1999.<ref name="Schnell">{{cite conference \|last1=Schnell\|first1=Markus \|last2=Schmidt \|first2=Markus \|last3=Jander \|first3=Manuel \|last4=Albert \|first4=Tobias \|last5=Geiger \|first5=Ralf \|last6=Ruoppila \|first6=Vesa \|last7=Ekstrand \|first7=Per \|last8=Bernhard \|first8=Grill \|date=October 2008 \|title=MPEG-4 Enhanced Low Delay AAC - A New Standard for High Quality Communication \|url=https://www.iis.fraunhofer.de/content/dam/iis/de/doc/ame/conference/AES-125-Convention_AAC-ELD-NewStandardForHighQualityCommunication_AES7503.pdf \|conference=125th AES Convention \|publisher=[[Audio Engineering Society]] \|access-date=20 October 2019 \|website=[[Fraunhofer IIS]]}}</ref> MDCT has since been widely adopted in [[voice-over-IP]] (VoIP) applications, such as the [[G.729.1]] [[wideband audio]] codec introduced in 2006,<ref name="Nagireddi">{{cite book \|last1=Nagireddi \|first1=Sivannarayana \|title=VoIP Voice and Fax Signal Processing \|date=2008 \|publisher=[[John Wiley & Sons]] \|isbn=9780470377864 \|page=69 \|url=https://books.google.com/books?id=5AneeZFE71MC&pg=PA69}}</ref> [[Apple Inc.\|Apple]]'s [[FaceTime]] (using AAC-LD) introduced in 2010,<ref name="AppleInsider standards 1">{{cite web\|url=http://www.appleinsider.com/articles/10/06/08/inside_iphone_4_facetime_video_calling.html\|date=June 8, 2010\|access-date=June 9, 2010\|title=Inside iPhone 4: FaceTime video calling\|publisher=[[AppleInsider]]\|author=Daniel Eran Dilger}}</ref> and the [[CELT]] codec introduced in 2011.<ref name="presentation">[http://people.xiph.org/~greg/video/linux_conf_au_CELT_2.ogv Presentation of the CELT codec] {{Webarchive\|url=https://web.archive.org/web/20110807182250/http://people.xiph.org/~greg/video/linux_conf_au_CELT_2.ogv \|date=2011-08-07 }} by Timothy B. Terriberry (65 minutes of video, see also [http://www.celt-codec.org/presentations/misc/lca-celt.pdf presentation slides] in PDF)</ref> [[Opus (audio format)\|Opus]] is a [[free software]] audio coder. It combines the speech-oriented LPC-based [[SILK]] algorithm and the lower-latency MDCT-based CELT algorithm, switching between or combining them as needed for maximal efficiency.<ref name="homepage">{{cite web \|url = https://opus-codec.org/ \|title=Opus Codec \|work=Opus \|publisher=Xiph.org Foundation \|type=Home page \|access-date=July 31, 2012 }}</ref><ref>{{cite conference \|last1=Valin \|first1=Jean-Marc \|last2=Maxwell \|first2=Gregory \|last3=Terriberry \|first3=Timothy B. \|last4=Vos \|first4=Koen \|title=High-Quality, Low-Delay Music Coding in the Opus Codec \|conference=135th AES Convention \|publisher=[[Audio Engineering Society]] \|date=October 2013 \|arxiv=1602.04845 }}</ref> It is widely used for VoIP calls in [[WhatsApp]].<ref name="Register">{{cite news \|last1=Leyden \|first1=John \|title=WhatsApp laid bare: Info-sucking app's innards probed \|url=https://www.theregister.co.uk/2015/10/27/whatsapp_forensic_analysis/ \|access-date=19 October 2019 \|work=[[The Register]] \|date=27 October 2015}}</ref><ref name="Hazra">{{cite book \|last1=Hazra \|first1=Sudip \|last2=Mateti \|first2=Prabhaker \|chapter=Challenges in Android Forensics \|editor-last1=Thampi \|editor-first1=Sabu M. \|editor-last2=Pérez \|editor-first2=Gregorio Martínez \|editor-last3=Westphall \|editor-first3=Carlos Becker \|editor-last4=Hu \|editor-first4=Jiankun \|editor-last5=Fan \|editor-first5=Chun I. \|editor-last6=Mármol \|editor-first6=Félix Gómez \|title=Security in Computing and Communications: 5th International Symposium, SSCC 2017 \|date=September 13–16, 2017 \|publisher=Springer \|isbn=9789811068980 \|pages=286–299 (290) \|doi=10.1007/978-981-10-6898-0_24 \|chapter-url=https://books.google.com/books?id=1u09DwAAQBAJ&pg=PA290}}</ref><ref name="Srivastava">{{cite book \|last1=Srivastava \|first1=Saurabh Ranjan \|last2=Dube \|first2=Sachin \|last3=Shrivastaya \|first3=Gulshan \|last4=Sharma \|first4=Kavita \|chapter=Smartphone Triggered Security Challenges: Issues, Case Studies and Prevention ~~\|journal=Cyber Security in Parallel and Distributed Computing~~ \|editor-last1=Le \|editor-first1=Dac-Nhuong \|editor-last2=Kumar \|editor-first2=Raghvendra \|editor-last3=Mishra \|editor-first3=Brojo Kishore \|editor-last4=Chatterjee \|editor-first4=Jyotir Moy \|editor-last5=Khari \|editor-first5=Manju \|title=Cyber Security in Parallel and Distributed Computing: Concepts, Techniques, Applications and Case Studies \|date=2019 \|publisher=John Wiley & Sons \|isbn=9781119488057 \|pages=187–206 (200) \|doi=10.1002/9781119488330.ch12 \|s2cid=214034702 \|chapter-url=https://books.google.com/books?id=FzGtDwAAQBAJ&pg=PA200}}</ref> The [[PlayStation 4]] video game console also uses Opus for its [[PlayStation Network]] system party chat.<ref name="playstation">{{cite web\|url=https://doc.dl.playstation.net/doc/ps4-oss/ \|title=Open Source Software used in PlayStation4 \|publisher=Sony Interactive Entertainment Inc. \|access-date=2017-12-11}}{{failed verification\|reason=Source does not indicate how Opus is used\|date=September 2022}}</ref> A number of codecs with even lower [[bit rate]]s have been demonstrated. [[Codec2]], which operates at bit rates as low as {{nowrap\|450~~ ~~ bit/s}}, sees use in amateur radio.<ref>{{cite web \|title=GitHub - Codec2 \|website=[[GitHub]] \|date=November 2019 \|url=https://github.com/x893/codec2}}</ref> NATO currently uses [[~~Mixed-excitation linear prediction\|~~MELPe]], offering intelligible speech at {{nowrap\|600~~ ~~ bit/s}} and below.<ref>Alan McCree, “A scalable phonetic vocoder framework using joint predictive vector quantization of MELP parameters,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, 2006, pp. I 705–708, Toulouse, France</ref> Neural vocoder approaches have also emerged: [[Lyra (codec)\|Lyra]] by Google gives an "almost eerie" quality at {{nowrap\|3~~ ~~ kbit/s}}.<ref name=":0">{{Cite web \|last=Buckley \|first=Ian \|date=2021-04-08 \|title=Google Makes Its Lyra Low Bitrate Speech Codec Public \|url=https://www.makeuseof.com/google-lyra-speech-codec-public/ \|access-date=2022-07-21 \|website=MakeUseOf \|language=en-US}}</ref> Microsoft's [[Satin (codec)\|Satin]] also uses machine learning, but uses a higher tunable bitrate and is wideband.<ref name=":3">{{Cite web \|last=Levent-Levi \|first=Tsahi \|date=2021-04-19 \|title=Lyra, Satin and the future of voice codecs in WebRTC \|url=https://bloggeek.me/lyra-satin-webrtc-voice-codecs/ \|access-date=2022-07-21 \|website=BlogGeek.me \|language=en-US}}</ref> ===Sub-fields=== Line 83: [[Category:Speech codecs\| ]] [[Category:Data compression]]