Speech coding: Difference between revisions

Content deleted Content added
Pkna22 (talk | contribs)
Added template
Tags: Mobile edit Mobile web edit
m avoid bit/s wrap at slash
 
(6 intermediate revisions by 5 users not shown)
Line 3:
{{more citations needed|date=January 2013}}
 
'''Speech coding''' is an application of [[data compression]] to [[digital audio]] signals containing [[speech]]. Speech coding uses speech-specific [[parameter estimation]] using [[audio signal processing]] techniques to model the speech signal, combined with generic data compression algorithms to represent the resulting modeled parameters in a compact bitstream.<ref>{{cite journal|firstfirst1=M. |lastlast1=Arjona Ramírez|first2=M.|last2=Minam|title=Low bit rate speech coding|journal=Wiley Encyclopedia of Telecommunications, J. G. Proakis, Ed.|city___location=New York| publisher=Wiley|year=2003| volume= 3|pppages=1299-13081299–1308}}</ref>
 
Common applications of speech coding are [[mobile telephony]] and [[voice over IP]] (VoIP).<ref>M. Arjona Ramírez and M. Minami, "Technology and standards for low-bit-rate vocoding methods," in The Handbook of Computer Networks, H. Bidgoli, Ed., New York: Wiley, 2011, vol. 2, pp. 447–467.</ref> The most widely used speech coding technique in mobile telephony is [[linear predictive coding]] (LPC), while the most widely used in VoIP applications are the LPC and [[modified discrete cosine transform]] (MDCT) techniques.{{Citation needed|date=December 2019}}
Line 15:
# Waveform coders
#* Time-___domain: [[PCM]], [[ADPCM]]
#* Frequency-___domain: [[sub-band coding]], [[Adaptive Transform Acoustic Coding|ATRAC]]
# [[Vocoder]]s
#* [[Linear predictive coding]] (LPC)
Line 22:
 
== Sample companding viewed as a form of speech coding ==
The [[A-law algorithm|A-law]] and [[μ-law algorithm]]s used in [[G.711]] PCM [[digital telephony]] can be seen as an earlier precursor of speech encoding, requiring only 8 bits per sample but giving effectively 12 [[audio bit depth|bits of resolution]].<ref>{{cite book|first1=N. S. |last1=Jayant and |first2=P.|last2= Noll,|title= Digital coding of waveforms.|___location= Englewood Cliffs:|publisher= Prentice-Hall, |year=1984.}}</ref> Logarithmic companding are consistent with human hearing perception in that a low-amplitude noise is heard along a low-amplitude speech signal but is masked by a high-amplitude one. Although this would generate unacceptable distortion in a music signal, the peaky nature of speech waveforms, combined with the simple frequency structure of speech as a [[periodic function|periodic waveform]] having a single [[fundamental frequency]] with occasional added noise bursts, make these very simple instantaneous compression algorithms acceptable for speech.{{citation needed|date=July 2023}}{{dubious|discuss=Logarithmic companding for music|date=July 2023}}
 
A wide variety of other algorithms were tried at the time, mostly [[delta modulation]] variants, but after careful consideration, the A-law/μ-law algorithms were chosen by the designers of the early digital telephony systems. At the time of their design, their 33% bandwidth reduction for a very low complexity made an excellent engineering compromise. Their audio performance remains acceptable, and there was no need to replace them in the stationary phone network.{{citation needed|date=July 2023}}
Line 35:
The [[modified discrete cosine transform]] (MDCT) is used in the LD-MDCT technique used by the [[AAC-LD]] format introduced in 1999.<ref name="Schnell">{{cite conference |last1=Schnell|first1=Markus |last2=Schmidt |first2=Markus |last3=Jander |first3=Manuel |last4=Albert |first4=Tobias |last5=Geiger |first5=Ralf |last6=Ruoppila |first6=Vesa |last7=Ekstrand |first7=Per |last8=Bernhard |first8=Grill |date=October 2008 |title=MPEG-4 Enhanced Low Delay AAC - A New Standard for High Quality Communication |url=https://www.iis.fraunhofer.de/content/dam/iis/de/doc/ame/conference/AES-125-Convention_AAC-ELD-NewStandardForHighQualityCommunication_AES7503.pdf |conference=125th AES Convention |publisher=[[Audio Engineering Society]] |access-date=20 October 2019 |website=[[Fraunhofer IIS]]}}</ref> MDCT has since been widely adopted in [[voice-over-IP]] (VoIP) applications, such as the [[G.729.1]] [[wideband audio]] codec introduced in 2006,<ref name="Nagireddi">{{cite book |last1=Nagireddi |first1=Sivannarayana |title=VoIP Voice and Fax Signal Processing |date=2008 |publisher=[[John Wiley & Sons]] |isbn=9780470377864 |page=69 |url=https://books.google.com/books?id=5AneeZFE71MC&pg=PA69}}</ref> [[Apple Inc.|Apple]]'s [[FaceTime]] (using AAC-LD) introduced in 2010,<ref name="AppleInsider standards 1">{{cite web|url=http://www.appleinsider.com/articles/10/06/08/inside_iphone_4_facetime_video_calling.html|date=June 8, 2010|access-date=June 9, 2010|title=Inside iPhone 4: FaceTime video calling|publisher=[[AppleInsider]]|author=Daniel Eran Dilger}}</ref> and the [[CELT]] codec introduced in 2011.<ref name="presentation">[http://people.xiph.org/~greg/video/linux_conf_au_CELT_2.ogv Presentation of the CELT codec] {{Webarchive|url=https://web.archive.org/web/20110807182250/http://people.xiph.org/~greg/video/linux_conf_au_CELT_2.ogv |date=2011-08-07 }} by Timothy B. Terriberry (65 minutes of video, see also [http://www.celt-codec.org/presentations/misc/lca-celt.pdf presentation slides] in PDF)</ref>
 
[[Opus (audio format)|Opus]] is a [[free software]] audio coder. It combines the speech-oriented LPC-based [[SILK]] algorithm and the lower-latency MDCT-based CELT algorithm, switching between or combining them as needed for maximal efficiency.<ref name="homepage">{{cite web |url = https://opus-codec.org/ |title=Opus Codec |work=Opus |publisher=Xiph.org Foundation |type=Home page |access-date=July 31, 2012 }}</ref><ref>{{cite conference |last1=Valin |first1=Jean-Marc |last2=Maxwell |first2=Gregory |last3=Terriberry |first3=Timothy B. |last4=Vos |first4=Koen |title=High-Quality, Low-Delay Music Coding in the Opus Codec |conference=135th AES Convention |publisher=[[Audio Engineering Society]] |date=October 2013 |arxiv=1602.04845 }}</ref> It is widely used for VoIP calls in [[WhatsApp]].<ref name="Register">{{cite news |last1=Leyden |first1=John |title=WhatsApp laid bare: Info-sucking app's innards probed |url=https://www.theregister.co.uk/2015/10/27/whatsapp_forensic_analysis/ |access-date=19 October 2019 |work=[[The Register]] |date=27 October 2015}}</ref><ref name="Hazra">{{cite book |last1=Hazra |first1=Sudip |last2=Mateti |first2=Prabhaker |chapter=Challenges in Android Forensics |editor-last1=Thampi |editor-first1=Sabu M. |editor-last2=Pérez |editor-first2=Gregorio Martínez |editor-last3=Westphall |editor-first3=Carlos Becker |editor-last4=Hu |editor-first4=Jiankun |editor-last5=Fan |editor-first5=Chun I. |editor-last6=Mármol |editor-first6=Félix Gómez |title=Security in Computing and Communications: 5th International Symposium, SSCC 2017 |date=September 13–16, 2017 |publisher=Springer |isbn=9789811068980 |pages=286–299 (290) |doi=10.1007/978-981-10-6898-0_24 |chapter-url=https://books.google.com/books?id=1u09DwAAQBAJ&pg=PA290}}</ref><ref name="Srivastava">{{cite book |last1=Srivastava |first1=Saurabh Ranjan |last2=Dube |first2=Sachin |last3=Shrivastaya |first3=Gulshan |last4=Sharma |first4=Kavita |chapter=Smartphone Triggered Security Challenges: Issues, Case Studies and Prevention |journal=Cyber Security in Parallel and Distributed Computing |editor-last1=Le |editor-first1=Dac-Nhuong |editor-last2=Kumar |editor-first2=Raghvendra |editor-last3=Mishra |editor-first3=Brojo Kishore |editor-last4=Chatterjee |editor-first4=Jyotir Moy |editor-last5=Khari |editor-first5=Manju |title=Cyber Security in Parallel and Distributed Computing: Concepts, Techniques, Applications and Case Studies |date=2019 |publisher=John Wiley & Sons |isbn=9781119488057 |pages=187–206 (200) |doi=10.1002/9781119488330.ch12 |s2cid=214034702 |chapter-url=https://books.google.com/books?id=FzGtDwAAQBAJ&pg=PA200}}</ref> The [[PlayStation 4]] video game console also uses Opus for its [[PlayStation Network]] system party chat.<ref name="playstation">{{cite web|url=https://doc.dl.playstation.net/doc/ps4-oss/ |title=Open Source Software used in PlayStation4 |publisher=Sony Interactive Entertainment Inc. |access-date=2017-12-11}}{{failed verification|reason=Source does not indicate how Opus is used|date=September 2022}}</ref>
 
A number of codecs with even lower [[bit rate]]s have been demonstrated. [[Codec2]], which operates at bit rates as low as {{nowrap|450&nbsp; bit/s}}, sees use in amateur radio.<ref>{{cite web |title=GitHub - Codec2 |website=[[GitHub]] |date=November 2019 |url=https://github.com/x893/codec2}}</ref> NATO currently uses [[Mixed-excitation linear prediction|MELPe]], offering intelligible speech at {{nowrap|600&nbsp; bit/s}} and below.<ref>Alan McCree, “A scalable phonetic vocoder framework using joint predictive vector quantization of MELP parameters,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, 2006, pp. I 705–708, Toulouse, France</ref> Neural vocoder approaches have also emerged: [[Lyra (codec)|Lyra]] by Google gives an "almost eerie" quality at {{nowrap|3&nbsp; kbit/s}}.<ref name=":0">{{Cite web |last=Buckley |first=Ian |date=2021-04-08 |title=Google Makes Its Lyra Low Bitrate Speech Codec Public |url=https://www.makeuseof.com/google-lyra-speech-codec-public/ |access-date=2022-07-21 |website=MakeUseOf |language=en-US}}</ref> Microsoft's [[Satin (codec)|Satin]] also uses machine learning, but uses a higher tunable bitrate and is wideband.<ref name=":3">{{Cite web |last=Levent-Levi |first=Tsahi |date=2021-04-19 |title=Lyra, Satin and the future of voice codecs in WebRTC |url=https://bloggeek.me/lyra-satin-webrtc-voice-codecs/ |access-date=2022-07-21 |website=BlogGeek.me |language=en-US}}</ref>
 
===Sub-fields===
Line 83:
 
[[Category:Speech codecs| ]]
[[Category:Data compression]]