Revision as of 18:10, 27 August 2023 edit Kvng (talk \| contribs) Extended confirmed users, New page reviewers 115,962 edits review: cp ref from G.711 ← Previous edit		Revision as of 23:21, 13 October 2023 edit undo Kvng (talk \| contribs) Extended confirmed users, New page reviewers 115,962 edits review: rm unsupported and tagged for several years Next edit →
Line 26: A wide variety of other algorithms were tried at the time, mostly [[delta modulation]] variants, but after careful consideration, the A-law/μ-law algorithms were chosen by the designers of the early digital telephony systems. At the time of their design, their 33% bandwidth reduction for a very low complexity made an excellent engineering compromise. Their audio performance remains acceptable, and there was no need to replace them in the stationary phone network.{{citation needed\|date=July 2023}} In 2008, [[G.711.1]] codec, which has a scalable structure, was standardized by ITU-T. The input sampling rate is 16 kHz.<ref name="g711-1-2012">{{citation \|publisher=ITU-T \|date=2012 \|url=http://www.itu.int/rec/T-REC-G.711.1/en \|title=G.711.1 : Wideband embedded extension for G.711 pulse code modulation \|access-date=2022-12-24}}</ref~~><!--[[User:Kvng/RTH]]--~~> == Modern speech compression == Much of the later work in speech compression was motivated by military research into digital communications for [[Secure voice\|secure military radios]], where very low data rates were used to achieve effective operation in a hostile radio environment. At the same time, far more [[processing power]] was available, in the form of [[Very Large Scale Integration\|VLSI circuits]], than was available for earlier compression techniques. As a result, modern speech compression algorithms could use far more complex techniques than were available in the 1960s to achieve far higher compression ratios. The most widely used speech coding algorithms are based on [[linear predictive coding]] (LPC).<ref>{{cite journal \|last1=Gupta \|first1=Shipra \|title=Application of MFCC in Text Independent Speaker Recognition \|journal=International Journal of Advanced Research in Computer Science and Software Engineering \|date=May 2016 \|volume=6 \|issue=5 \|pages=805–810 (806) \|s2cid=212485331 \|issn=2277-128X \|url=https://pdfs.semanticscholar.org/2aa9/c2971342e8b0b1a0714938f39c406f258477.pdf \|archive-url=https://web.archive.org/web/20191018231621/https://pdfs.semanticscholar.org/2aa9/c2971342e8b0b1a0714938f39c406f258477.pdf \|url-status=dead \|archive-date=2019-10-18 \|access-date=18 October 2019}}</ref> In particular, the most common speech coding scheme is the LPC-based [[code-excited linear prediction]] (CELP) coding, which is used for example in the [[GSM]] standard. In CELP, the modeling is divided in two stages, a [[linear prediction\|linear predictive]] stage that models the spectral envelope and a code-book-based model of the residual of the linear predictive model. In CELP, linear prediction coefficients (LPC) are computed and quantized, usually as [[line spectral pairs]] (LSPs). In addition to the actual speech coding of the signal, it is often necessary to use [[channel coding]] for transmission, to avoid losses due to transmission errors. In order to get the best overall coding results, speech coding and channel coding methods are chosen in pairs, with the more important bits in the speech data stream protected by more robust channel coding.<!--[[User:Kvng/RTH]]-->▼ These techniques were available through the open research literature to be used for civilian applications, allowing the creation of digital [[mobile phone network]]s with substantially higher channel capacities than the analog systems that preceded them.{{Citation needed\|date=December 2019}} ▲The most widely used speech coding algorithms are based on [[linear predictive coding]] (LPC).<ref>{{cite journal \|last1=Gupta \|first1=Shipra \|title=Application of MFCC in Text Independent Speaker Recognition \|journal=International Journal of Advanced Research in Computer Science and Software Engineering \|date=May 2016 \|volume=6 \|issue=5 \|pages=805–810 (806) \|s2cid=212485331 \|issn=2277-128X \|url=https://pdfs.semanticscholar.org/2aa9/c2971342e8b0b1a0714938f39c406f258477.pdf \|archive-url=https://web.archive.org/web/20191018231621/https://pdfs.semanticscholar.org/2aa9/c2971342e8b0b1a0714938f39c406f258477.pdf \|url-status=dead \|archive-date=2019-10-18 \|access-date=18 October 2019}}</ref> In particular, the most common speech coding scheme is the LPC-based [[code-excited linear prediction]] (CELP) coding, which is used for example in the [[GSM]] standard. In CELP, the modeling is divided in two stages, a [[linear prediction\|linear predictive]] stage that models the spectral envelope and a code-book-based model of the residual of the linear predictive model. In CELP, linear prediction coefficients (LPC) are computed and quantized, usually as [[line spectral pairs]] (LSPs). In addition to the actual speech coding of the signal, it is often necessary to use [[channel coding]] for transmission, to avoid losses due to transmission errors. In order to get the best overall coding results, speech coding and channel coding methods are chosen in pairs, with the more important bits in the speech data stream protected by more robust channel coding. The [[modified discrete cosine transform]] (MDCT), a type of [[discrete cosine transform]] (DCT) algorithm, was adapted into a speech coding algorithm called LD-MDCT, used for the [[AAC-LD]] format introduced in 1999.<ref name="Schnell">{{cite conference \|last1=Schnell\|first1=Markus \|last2=Schmidt \|first2=Markus \|last3=Jander \|first3=Manuel \|last4=Albert \|first4=Tobias \|last5=Geiger \|first5=Ralf \|last6=Ruoppila \|first6=Vesa \|last7=Ekstrand \|first7=Per \|last8=Bernhard \|first8=Grill \|date=October 2008 \|title=MPEG-4 Enhanced Low Delay AAC - A New Standard for High Quality Communication \|url=https://www.iis.fraunhofer.de/content/dam/iis/de/doc/ame/conference/AES-125-Convention_AAC-ELD-NewStandardForHighQualityCommunication_AES7503.pdf \|conference=125th AES Convention \|publisher=[[Audio Engineering Society]] \|access-date=20 October 2019 \|website=[[Fraunhofer IIS]]}}</ref> MDCT has since been widely adopted in [[voice-over-IP]] (VoIP) applications, such as the [[G.729.1]] [[wideband audio]] codec introduced in 2006,<ref name="Nagireddi">{{cite book \|last1=Nagireddi \|first1=Sivannarayana \|title=VoIP Voice and Fax Signal Processing \|date=2008 \|publisher=[[John Wiley & Sons]] \|isbn=9780470377864 \|page=69 \|url=https://books.google.com/books?id=5AneeZFE71MC&pg=PA69}}</ref> [[Apple Inc.\|Apple]]'s [[FaceTime]] (using AAC-LD) introduced in 2010,<ref name="AppleInsider standards 1">{{cite web\|url=http://www.appleinsider.com/articles/10/06/08/inside_iphone_4_facetime_video_calling.html\|date=June 8, 2010\|access-date=June 9, 2010\|title=Inside iPhone 4: FaceTime video calling\|publisher=[[Apple community#AppleInsider\|AppleInsider]]\|author=Daniel Eran Dilger}}</ref> and the [[CELT]] codec introduced in 2011.<ref name="presentation">[http://people.xiph.org/~greg/video/linux_conf_au_CELT_2.ogv Presentation of the CELT codec] {{Webarchive\|url=https://web.archive.org/web/20110807182250/http://people.xiph.org/~greg/video/linux_conf_au_CELT_2.ogv \|date=2011-08-07 }} by Timothy B. Terriberry (65 minutes of video, see also [http://www.celt-codec.org/presentations/misc/lca-celt.pdf presentation slides] in PDF)</ref>

Speech coding: Difference between revisions