Speech coding: Difference between revisions

Content deleted Content added
m Automated conversion
major rewrite
Line 1:
'''Speech coding''' is the [[audio compression|compression]] of speech (into a [[code]]) for [[telecommunication|transmission]] using [[audio signal processing]] and [[speech processing]] techniques. The two most important applications using speech coding are [[mobile phone|mobile phones]] and [[internet phone|internet phones]].
'''Speech coding''' is the compression of speech (into a code) for transmission. Generally, the [[spectrum|spectral]] [[spectral envelope|envelope]] of the input [[signal]] is represented by an all-pole filter which is excited by a pulse train. The most common filter generation method is [[linear predictive coding]] (LPC) by the autocorrelation method. However, the filter coefficients are sensitive to errors and their range is largely unknown. The coefficients are therefore coded into some other representation, which is more tolerant to errors. Such representations are, among others, [[line spectrum pair]] (LSP), [[log-area ratios]] (LAR) and reflection coefficients (related to [[lattice filter|lattice filters]] and [[Levinson-Durbin recursion]]). The most widely used of these is the LSP, which is used for example in the [[GSM]] standard.
 
SpeechThe techniques used in speech coding methodsare applysimilar theoryto fromthat in [[audio compression]] and [[audio signal processingcoding]], bywhere concentratingknowledge onlyin on[[psychoacoustics]] informationis used ofto thetransmit signalonly data that is audiblerelevant to the human auditory system. For example, in narrow-band speech coding, only information in the frequency band 400Hz to 3500Hz is transmitted but the reconstructed signal is still adequate for illegbility. However, speech coding differs from audio coding in that there is a lot more statistical information available about the properties of speech. In addition, some auditory information which is relevant in audio coding can be unneccesary in the speech coding context. In speech coding, the most important criterion is always preservation of intelligiblity of speech, with a constrained amount of transmitted data. But it should be emphasised that intelligibility of speech includes, besides the actual litteral content, also speaker identity, emotions, intonation, timbre etc. that are all important for perfect intelligibility.
 
The most common speech coding scheme is Code-Excited Linear Predictive (CELP) coding, which is used for example in the [[GSM]] standard. In CELP, the modelling is divided in two stages, a [[linear prediction|linear predictive]] stage that models the spectral envelope and code-book based model of the residual of the linear predictive model.
Major subfields:
* [[Wide band speech coding]]
* [[Narrow band speech coding]]
In addition to the actual speech coding of the signal, it is often neccessary to use [[channel coding]] for transmission, to avoid losses due to transmission errors. Usually, speech coding and channel coding methods have to be chosen in pairs in order to get the best overal coding results.
 
See also: [[Digital signal processing]], [[Speech processing]], [[Audio signal processing]], [[Data compression]], [[Telecommunication]], [[Mobile phone]], [[Psychoacoustic model]].
Major subfields:
* [[Wide -band speech coding]]
** [[GSM]]
** [[NMT]]
* [[Narrow -band speech coding]]
See also: [[Digital signal processing]], [[Speech processing]], [[Audio signal processing]], [[Data compression]], [[Telecommunication]], [[Mobile phone]], [[Psychoacoustic model]].