Speech coding: Difference between revisions

Content deleted Content added
review: rm unsupported and tagged for several years
review: ce for focus and clarity
Line 31:
Much of the later work in speech compression was motivated by military research into digital communications for [[Secure voice|secure military radios]], where very low data rates were used to achieve effective operation in a hostile radio environment. At the same time, far more [[processing power]] was available, in the form of [[Very Large Scale Integration|VLSI circuits]], than was available for earlier compression techniques. As a result, modern speech compression algorithms could use far more complex techniques than were available in the 1960s to achieve far higher compression ratios.
 
The most widely used speech coding algorithms are based on [[linear predictive coding]] (LPC).<ref>{{cite journal |last1=Gupta |first1=Shipra |title=Application of MFCC in Text Independent Speaker Recognition |journal=International Journal of Advanced Research in Computer Science and Software Engineering |date=May 2016 |volume=6 |issue=5 |pages=805–810 (806) |s2cid=212485331 |issn=2277-128X |url=https://pdfs.semanticscholar.org/2aa9/c2971342e8b0b1a0714938f39c406f258477.pdf |archive-url=https://web.archive.org/web/20191018231621/https://pdfs.semanticscholar.org/2aa9/c2971342e8b0b1a0714938f39c406f258477.pdf |url-status=dead |archive-date=2019-10-18 |access-date=18 October 2019}}</ref> In particular, the most common speech coding scheme is the LPC-based [[code-excited linear prediction]] (CELP) coding, which is used for example in the [[GSM]] standard. In CELP, the modeling is divided in two stages, a [[linear prediction|linear predictive]] stage that models the spectral envelope and a code-book-based model of the residual of the linear predictive model. In CELP, linear prediction coefficients (LPC) are computed and quantized, usually as [[line spectral pairs]] (LSPs). In addition to the actual speech coding of the signal, it is often necessary to use [[channel coding]] for transmission, to avoid losses due to transmission errors. In order to get the best overall coding results, speech coding and channel coding methods are chosen in pairs, with the more important bits in the speech data stream protected by more robust channel coding.<!--[[User:Kvng/RTH]]-->
 
The [[modified discrete cosine transform]] (MDCT), ais typeused ofin [[discrete cosine transform]] (DCT) algorithm, was adapted into a speech coding algorithm calledthe LD-MDCT, technique used forby the [[AAC-LD]] format introduced in 1999.<ref name="Schnell">{{cite conference |last1=Schnell|first1=Markus |last2=Schmidt |first2=Markus |last3=Jander |first3=Manuel |last4=Albert |first4=Tobias |last5=Geiger |first5=Ralf |last6=Ruoppila |first6=Vesa |last7=Ekstrand |first7=Per |last8=Bernhard |first8=Grill |date=October 2008 |title=MPEG-4 Enhanced Low Delay AAC - A New Standard for High Quality Communication |url=https://www.iis.fraunhofer.de/content/dam/iis/de/doc/ame/conference/AES-125-Convention_AAC-ELD-NewStandardForHighQualityCommunication_AES7503.pdf |conference=125th AES Convention |publisher=[[Audio Engineering Society]] |access-date=20 October 2019 |website=[[Fraunhofer IIS]]}}</ref> MDCT has since been widely adopted in [[voice-over-IP]] (VoIP) applications, such as the [[G.729.1]] [[wideband audio]] codec introduced in 2006,<ref name="Nagireddi">{{cite book |last1=Nagireddi |first1=Sivannarayana |title=VoIP Voice and Fax Signal Processing |date=2008 |publisher=[[John Wiley & Sons]] |isbn=9780470377864 |page=69 |url=https://books.google.com/books?id=5AneeZFE71MC&pg=PA69}}</ref> [[Apple Inc.|Apple]]'s [[FaceTime]] (using AAC-LD) introduced in 2010,<ref name="AppleInsider standards 1">{{cite web|url=http://www.appleinsider.com/articles/10/06/08/inside_iphone_4_facetime_video_calling.html|date=June 8, 2010|access-date=June 9, 2010|title=Inside iPhone 4: FaceTime video calling|publisher=[[Apple community#AppleInsider|AppleInsider]]|author=Daniel Eran Dilger}}</ref> and the [[CELT]] codec introduced in 2011.<ref name="presentation">[http://people.xiph.org/~greg/video/linux_conf_au_CELT_2.ogv Presentation of the CELT codec] {{Webarchive|url=https://web.archive.org/web/20110807182250/http://people.xiph.org/~greg/video/linux_conf_au_CELT_2.ogv |date=2011-08-07 }} by Timothy B. Terriberry (65 minutes of video, see also [http://www.celt-codec.org/presentations/misc/lca-celt.pdf presentation slides] in PDF)</ref><!--[[User:Kvng/RTH]]-->
 
[[Opus (audio format)|Opus]] is a [[free software]] audio coder. It combines both the MDCT (CELT) and LPC (SILK) audio compression algorithms, using the former for speech.<ref>{{cite conference |last1=Valin |first1=Jean-Marc |last2=Maxwell |first2=Gregory |last3=Terriberry |first3=Timothy B. |last4=Vos |first4=Koen |title=High-Quality, Low-Delay Music Coding in the Opus Codec |conference=135th AES Convention |publisher=[[Audio Engineering Society]] |date=October 2013 |arxiv=1602.04845 }}</ref> It is widely used for VoIP calls in [[WhatsApp]].<ref name="Register">{{cite news |last1=Leyden |first1=John |title=WhatsApp laid bare: Info-sucking app's innards probed |url=https://www.theregister.co.uk/2015/10/27/whatsapp_forensic_analysis/ |access-date=19 October 2019 |work=[[The Register]] |date=27 October 2015}}</ref><ref name="Hazra">{{cite book |last1=Hazra |first1=Sudip |last2=Mateti |first2=Prabhaker |chapter=Challenges in Android Forensics |editor-last1=Thampi |editor-first1=Sabu M. |editor-last2=Pérez |editor-first2=Gregorio Martínez |editor-last3=Westphall |editor-first3=Carlos Becker |editor-last4=Hu |editor-first4=Jiankun |editor-last5=Fan |editor-first5=Chun I. |editor-last6=Mármol |editor-first6=Félix Gómez |title=Security in Computing and Communications: 5th International Symposium, SSCC 2017 |date=September 13–16, 2017 |publisher=Springer |isbn=9789811068980 |pages=286–299 (290) |doi=10.1007/978-981-10-6898-0_24 |chapter-url=https://books.google.com/books?id=1u09DwAAQBAJ&pg=PA290}}</ref><ref name="Srivastava">{{cite book |last1=Srivastava |first1=Saurabh Ranjan |last2=Dube |first2=Sachin |last3=Shrivastaya |first3=Gulshan |last4=Sharma |first4=Kavita |chapter=Smartphone Triggered Security Challenges: Issues, Case Studies and Prevention |journal=Cyber Security in Parallel and Distributed Computing |editor-last1=Le |editor-first1=Dac-Nhuong |editor-last2=Kumar |editor-first2=Raghvendra |editor-last3=Mishra |editor-first3=Brojo Kishore |editor-last4=Chatterjee |editor-first4=Jyotir Moy |editor-last5=Khari |editor-first5=Manju |title=Cyber Security in Parallel and Distributed Computing: Concepts, Techniques, Applications and Case Studies |date=2019 |publisher=John Wiley & Sons |isbn=9781119488057 |pages=187–206 (200) |doi=10.1002/9781119488330.ch12 |s2cid=214034702 |chapter-url=https://books.google.com/books?id=FzGtDwAAQBAJ&pg=PA200}}</ref> The [[PlayStation 4]] video game console also uses Opus for its [[PlayStation Network]] system party chat.<ref name="playstation">{{cite web|url=https://doc.dl.playstation.net/doc/ps4-oss/ |title=Open Source Software used in PlayStation4 |publisher=Sony Interactive Entertainment Inc. |access-date=2017-12-11}}{{failed verification|reason=Source does not indicate how Opus is used|date=September 2022}}</ref>