Speech coding: Difference between revisions

Content deleted Content added
Citation bot (talk | contribs)
Add: s2cid. | Use this bot. Report bugs. | Suggested by Abductive | Category:Data compression | #UCB_Category 105/179
review: link improvements. tag dubious with discussion. terminology.
Line 9:
The techniques employed in speech coding are similar to those used in [[audio data compression]] and [[audio coding]] where appreciation of [[psychoacoustics]] is used to transmit only data that is relevant to the human auditory system. For example, in [[voiceband]] speech coding, only information in the frequency band 400 to 3500 Hz is transmitted but the reconstructed signal retains adequate [[Intelligibility (communication)|intelligibility]].
 
Speech coding differs from other forms of audio coding in that speech is a simpler signal than other audio signals, and statistical information is available about the properties of speech. As a result, some auditory information that is relevant in general audio coding can be unnecessary in the speech coding context. Speech coding stresses the preservation of intelligibility and ''pleasantness'' of speech while using a constrained amount of transmitted data.<ref>P. Kroon, "Evaluation of speech coders," in Speech Coding and Synthesis, W. Bastiaan Kleijn and K. K. Paliwal, Ed., Amsterdam: Elsevier Science, 1995, pp. 467-494.</ref> In addition, most speech applications require low coding delay, as [[Latency (audio)|latency]] interferes with speech interaction.<ref>J. H. Chen, R. V. Cox, Y.-C. Lin, N. S. Jayant, and M. J. Melchner, A low-delay CELP coder for the CCITT 16 kb/s speech coding standard. IEEE J. Select. Areas Commun. 10(5): 830-849, June 1992.</ref><!--[[User:Kvng/RTH]]-->
 
== Categories ==
Speech coders are of two typesclasses:<ref>{{cite web |url = http://users.ece.gatech.edu/~juang/8873/Bae-LPC10.ppt |title = Soo Hyun Bae, ECE 8873 Data Compression & Modeling, Georgia Institute of Technology , 2004 |archive-url=https://web.archive.org/web/20060907225836/http://users.ece.gatech.edu/~juang/8873/Bae-LPC10.ppt |archive-date=7 September 2006 |url-status=dead}}</ref>
# Waveform coders
#* Time-___domain: [[PCM]], [[ADPCM]]
Line 22:
 
== Sample companding viewed as a form of speech coding ==
The [[A-law algorithm|A-law]] and [[μ-law algorithm]]s (used in [[G.711]]) used in traditional [[Pulse-code modulation|PCM]] [[digital telephony]] can be seen as an earlier precursor of speech encoding, requiring only 8 bits per sample but giving effectively 12 [[audio bit depth|bits of resolution]].<ref>N. S. Jayant and P. Noll, Digital coding of waveforms. Englewood Cliffs: Prentice-Hall, 1984.</ref> The logarithmicLogarithmic companding laws are consistent with human hearing perception in that a low-amplitude noise is heard along a low-amplitude speech signal but is masked by a high-amplitude one. Although this would generate unacceptable distortion in a music signal, the peaky nature of speech waveforms, combined with the simple frequency structure of speech as a [[periodic function|periodic waveform]] having a single [[fundamental frequency]] with occasional added noise bursts, make these very simple instantaneous compression algorithms acceptable for speech.{{dubious|discuss=Logarithmic companding for music}}<!--[[User:Kvng/RTH]]-->
 
A wide variety of other algorithms were tried at the time, mostly [[delta modulation]] variants, but after careful consideration, the A-law/μ-law algorithms were chosen by the designers of the early digital telephony systems. At the time of their design, their 33% bandwidth reduction for a very low complexity made an excellent engineering compromise. Their audio performance remains acceptable, and there was no need to replace them in the stationary phone network.