Content deleted Content added
review: rm unsupported and tagged for several years |
m avoid bit/s wrap at slash |
||
(13 intermediate revisions by 7 users not shown) | |||
Line 3:
{{more citations needed|date=January 2013}}
'''Speech coding''' is an application of [[data compression]] to [[digital audio]] signals containing [[speech]]. Speech coding uses speech-specific [[parameter estimation]] using [[audio signal processing]] techniques to model the speech signal, combined with generic data compression algorithms to represent the resulting modeled parameters in a compact bitstream.<ref>{{cite journal|first1=M. |last1=Arjona Ramírez
Common applications of speech coding are [[mobile telephony]] and [[voice over IP]] (VoIP).<ref>M. Arjona Ramírez and M. Minami, "Technology and standards for low-bit-rate vocoding methods," in The Handbook of Computer Networks, H. Bidgoli, Ed., New York: Wiley, 2011, vol. 2, pp. 447–467.</ref> The most widely used speech coding technique in mobile telephony is [[linear predictive coding]] (LPC), while the most widely used in VoIP applications are the LPC and [[modified discrete cosine transform]] (MDCT) techniques.{{Citation needed|date=December 2019}}
Line 12:
== Categories ==
Speech coders are of two classes:<ref>{{cite web |url = http://users.ece.gatech.edu/~juang/8873/Bae-LPC10.ppt |title = Soo Hyun Bae, ECE 8873 Data Compression & Modeling, Georgia Institute of Technology
# Waveform coders
#* Time-___domain: [[PCM]], [[ADPCM]]
#* Frequency-___domain: [[sub-band coding]], [[
# [[Vocoder]]s
#* [[Linear predictive coding]] (LPC)
Line 22:
== Sample companding viewed as a form of speech coding ==
The [[
A wide variety of other algorithms were tried at the time, mostly [[delta modulation]] variants, but after careful consideration, the A-law/μ-law algorithms were chosen by the designers of the early digital telephony systems. At the time of their design, their 33% bandwidth reduction for a very low complexity made an excellent engineering compromise. Their audio performance remains acceptable, and there was no need to replace them in the stationary phone network.{{citation needed|date=July 2023}}
Line 31:
Much of the later work in speech compression was motivated by military research into digital communications for [[Secure voice|secure military radios]], where very low data rates were used to achieve effective operation in a hostile radio environment. At the same time, far more [[processing power]] was available, in the form of [[Very Large Scale Integration|VLSI circuits]], than was available for earlier compression techniques. As a result, modern speech compression algorithms could use far more complex techniques than were available in the 1960s to achieve far higher compression ratios.
The most widely used speech coding algorithms are based on [[linear predictive coding]] (LPC).<ref>{{cite journal |last1=Gupta |first1=Shipra |title=Application of MFCC in Text Independent Speaker Recognition |journal=International Journal of Advanced Research in Computer Science and Software Engineering |date=May 2016 |volume=6 |issue=5 |pages=805–810 (806) |s2cid=212485331 |issn=2277-128X |url=https://pdfs.semanticscholar.org/2aa9/c2971342e8b0b1a0714938f39c406f258477.pdf |archive-url=https://web.archive.org/web/20191018231621/https://pdfs.semanticscholar.org/2aa9/c2971342e8b0b1a0714938f39c406f258477.pdf |url-status=dead |archive-date=2019-10-18 |access-date=18 October 2019}}</ref> In particular, the most common speech coding scheme is the LPC-based [[code-excited linear prediction]] (CELP) coding, which is used for example in the [[GSM]] standard. In CELP, the modeling is divided in two stages, a [[linear prediction|linear predictive]] stage that models the spectral envelope and a code-book-based model of the residual of the linear predictive model. In CELP, linear prediction coefficients (LPC) are computed and quantized, usually as [[line spectral pairs]] (LSPs). In addition to the actual speech coding of the signal, it is often necessary to use [[channel coding]] for transmission, to avoid losses due to transmission errors. In order to get the best overall coding results, speech coding and channel coding methods are chosen in pairs, with the more important bits in the speech data stream protected by more robust channel coding.
The [[modified discrete cosine transform]] (MDCT)
[[Opus (audio format)|Opus]] is a [[free software]] audio coder. It combines
A number of codecs with even lower
===Sub-fields===
Line 44:
** [[AMR-WB]] for [[WCDMA]] networks
** [[VMR-WB]] for [[CDMA2000]] networks
** [[Speex]], IP-MR, [[SILK]] (part of [[Opus (audio format)|Opus]]), and [[Unified Speech and Audio Coding|USAC/xHE-AAC]] for
* [[Modified discrete cosine transform]] (MDCT)
** [[AAC-LD]], [[G.722.1]], [[G.729.1]], [[CELT]] and [[Opus (audio format)|Opus]] for VoIP and videoconferencing
* [[Adaptive differential pulse-code modulation]] (ADPCM)
** [[G.722]] for VoIP
Line 58:
** [[FNBDT]] for military applications
** [[Selectable Mode Vocoder|SMV]] for [[CDMA]] networks
** [[Full Rate]], [[Half Rate]], [[Enhanced
** [[G.723.1]], [[G.728]], [[G.729]], [[G.729.1]] and [[iLBC]] for VoIP or videoconferencing
* ADPCM
** [[G.726]] for VoIP
* [[Multi-Band Excitation]] (MBE)
** [[Multi-Band Excitation|AMBE+]] for [[digital radio|digital]] [[mobile radio]] and [[satellite
** [[Codec 2]]
Line 83:
[[Category:Speech codecs| ]]
[[Category:Data compression]]
|