Content deleted Content added
Artoria2e5 (talk | contribs) No edit summary Tags: Mobile edit Mobile web edit Advanced mobile edit |
m avoid bit/s wrap at slash |
||
(38 intermediate revisions by 13 users not shown) | |||
Line 1:
{{Short description|Lossy audio compression applied to human speech}}
{{Use American English|date=May 2022}}
{{more citations needed|date=January 2013}}
'''Speech coding''' is an application of [[data compression]]
The techniques employed in speech coding are similar to those used in [[audio data compression]] and [[audio coding]] where
Speech coding differs from other forms of audio coding in that speech is a simpler signal than
== Categories ==
Speech coders are of two
# Waveform coders
#* Time-___domain: [[PCM]], [[ADPCM]]
#* Frequency-___domain: [[sub-band coding]], [[
# [[Vocoder]]s
#* [[Linear predictive coding]] (LPC)
#* [[Formant synthesis|Formant coding]]
#* [[Machine learning]], i.e. [[Deep learning speech synthesis#Neural vocoder|neural vocoder]]<ref>{{cite journal |last1=Zeghidour |first1=Neil |last2=Luebs |first2=Alejandro |last3=Omran |first3=Ahmed |last4=Skoglund |first4=Jan |last5=Tagliasacchi |first5=Marco |title=SoundStream: An End-to-End Neural Audio Codec |journal=IEEE/ACM Transactions on Audio, Speech, and Language Processing |date=2022 |volume=30 |pages=495–507 |doi=10.1109/TASLP.2021.3129994|arxiv=2107.03312|s2cid=236149944 }}</ref>
== Sample companding viewed as a form of speech coding ==
The [[
A wide variety of other algorithms were tried at the time, mostly [[delta modulation]] variants, but after careful consideration, the A-law/μ-law algorithms were chosen by the designers of the early digital telephony systems. At the time of their design, their 33% bandwidth reduction for a very low complexity made an excellent engineering compromise. Their audio performance remains acceptable, and there was no need to replace them in the stationary phone network.{{citation needed|date=July 2023}}
In 2008, [[G.711.1]] codec, which has a scalable structure, was standardized by ITU-T. The input sampling rate is 16 kHz.<ref name="g711-1-2012">{{citation |publisher=ITU-T |date=2012 |url=http://www.itu.int/rec/T-REC-G.711.1/en |title=G.711.1 : Wideband embedded extension for G.711 pulse code modulation |access-date=2022-12-24}}</ref>
== Modern speech compression ==
Much of the later work in speech compression was motivated by military research into digital communications for [[Secure voice|secure military radios]], where very low data rates were
The most widely used speech coding algorithms are based on [[linear predictive coding]] (LPC).<ref>{{cite journal |last1=Gupta |first1=Shipra |title=Application of MFCC in Text Independent Speaker Recognition |journal=International Journal of Advanced Research in Computer Science and Software Engineering |date=May 2016 |volume=6 |issue=5 |pages=805–810 (806) |s2cid=212485331 |issn=2277-128X |url=https://pdfs.semanticscholar.org/2aa9/c2971342e8b0b1a0714938f39c406f258477.pdf |archive-url=https://web.archive.org/web/20191018231621/https://pdfs.semanticscholar.org/2aa9/c2971342e8b0b1a0714938f39c406f258477.pdf |url-status=dead |archive-date=2019-10-18 |access-date=18 October 2019}}</ref> In particular, the most common speech coding scheme is the LPC-based [[code-excited linear prediction]] (CELP) coding, which is used for example in the [[GSM]] standard. In CELP, the modeling is divided in two stages, a [[linear prediction|linear predictive]] stage that models the spectral envelope and a code-book-based model of the residual of the linear predictive model. In CELP, linear prediction coefficients (LPC) are computed and quantized, usually as [[line spectral pairs]] (LSPs). In addition to the actual speech coding of the signal, it is often necessary to use [[channel coding]] for transmission, to avoid losses due to transmission errors. In order to get the best overall coding results, speech coding and channel coding methods are chosen in pairs, with the more important bits in the speech data stream protected by more robust channel coding.
The [[modified discrete cosine transform]] (MDCT)
[[Opus (audio format)|Opus]] is a [[free software]] audio coder. It combines
A number of codecs with even lower
===Sub-fields===
Line 48 ⟶ 44:
** [[AMR-WB]] for [[WCDMA]] networks
** [[VMR-WB]] for [[CDMA2000]] networks
** [[Speex]], IP-MR, [[SILK]]
* [[Modified discrete cosine transform]] (MDCT)
** [[AAC-LD]], [[G.722.1]], [[G.729.1]], [[CELT]] and [[Opus (audio format)|Opus]] for VoIP and videoconferencing
* [[Adaptive differential pulse-code modulation]] (ADPCM)
** [[G.722]] for VoIP
* Neural speech coding
** [[Lyra (codec)|Lyra]] (Google): V1 uses neural network reconstruction of log-mel spectrogram; V2 is an end-to-end [[autoencoder]].
** [[Satin (codec)|Satin]] (Microsoft)
** LPCNet (Mozilla, Xiph): neural network reconstruction of LPC features<ref>{{cite web |title=LPCNet: Efficient neural speech synthesis |url=https://github.com/xiph/LPCNet |publisher=Xiph.Org Foundation |date=8 August 2023}}</ref>
; [[Narrowband]] audio coding
Line 58:
** [[FNBDT]] for military applications
** [[Selectable Mode Vocoder|SMV]] for [[CDMA]] networks
** [[Full Rate]], [[Half Rate]], [[Enhanced
** [[G.723.1]], [[G.728]], [[G.729]], [[G.729.1]] and [[iLBC]] for VoIP or videoconferencing
* ADPCM
** [[G.726]] for VoIP
* [[Multi-Band Excitation]] (MBE)
** [[Multi-Band Excitation|AMBE+]] for [[digital radio|digital]] [[mobile radio]] and [[satellite
** [[Codec 2]]
Line 83:
[[Category:Speech codecs| ]]
[[Category:Data compression]]
|