Audio coding format: Difference between revisions

Content deleted Content added
Undid revision 945486949 by 2409:4043:599:EC0D:BFB7:542E:A670:63 (talk)
Citation bot (talk | contribs)
Alter: url. Add: arxiv, chapter-url, title. Removed or converted URL. Converted bare reference to cite template. Some additions/deletions were actually parameter name changes. | You can use this bot yourself. Report bugs here. | Activated by Headbomb | via #UCB_webform
Line 4:
Some audio coding formats are documented by a detailed [[technical specification]] document known as an '''audio coding specification'''. Some such specifications are written and approved by [[standardization organization]]s as [[technical standard]]s, and are thus known as an '''audio coding standard'''. The term "standard" is also sometimes used for [[de facto standard|''de facto'' standards]] as well as formal standards.
 
Audio content encoded in a particular audio coding format is normally encapsulated within a [[container format (digital)|container format]]. As such, the user normally doesn't have a raw [[Advanced Audio Coding|AAC]] file, but instead has a .m4a [[audio file format|audio file]], which is a [[MPEG-4 Part 14]] container containing AAC-encoded audio. The container also contains [[metadata]] such as title and other tags, and perhaps an index for fast seeking.<ref>{{Cite web | url=http://superuser.com/questions/357686/where-is-synchronization-information-stored-in-container-formats | title=Video - Where is synchronization information stored in container formats?}}</ref> A notable exception is [[MP3]] files, which are raw audio coding without a container format. De facto standards for adding metadata tags such as title and artist to MP3s, such as [[ID3]], are [[Hack (computer science)#In computer science|hack]]s which work by appending the tags to the MP3, and then relying on the MP3 player to recognize the chunk as malformed audio coding and therefore skip it. In video files with audio, the encoded audio content is bundled with video (in a [[video coding format]]) inside a [[multimedia container format]].
 
An audio coding format does not dictate all [[algorithm]]s used by a [[codec]] implementing the format. An important part of how lossy audio compression works is by removing data in ways humans can't hear, according to a [[psychoacoustic model]]; the implementer of an encoder has some freedom of choice in which data to remove (according to their psychoacoustic model).
Line 22:
In 1950, [[Bell Labs]] filed the patent on [[differential pulse-code modulation]] (DPCM).<ref name="DPCM">{{US patent reference|inventor=C. Chapin Cutler|title=Differential Quantization of Communication Signals|number=2605361|A-Datum=1950-06-29|issue-date=1952-07-29}}</ref> [[Adaptive DPCM]] (ADPCM) was introduced by P. Cummiskey, [[Nikil Jayant|Nikil S. Jayant]] and [[James L. Flanagan]] at [[Bell Labs]] in 1973.<ref>P. Cummiskey, Nikil S. Jayant, and J. L. Flanagan, "Adaptive quantization in differential PCM coding of speech", ''Bell Syst. Tech. J.'', vol. 52, pp. 1105—1118, Sept. 1973</ref><ref>{{cite journal |last1=Cummiskey |first1=P. |last2=Jayant |first2=Nikil S. |last3=Flanagan |first3=J. L. |title=Adaptive quantization in differential PCM coding of speech |journal=The Bell System Technical Journal |date=1973 |volume=52 |issue=7 |pages=1105–1118 |doi=10.1002/j.1538-7305.1973.tb02007.x |issn=0005-8580}}</ref>
 
[[Perceptual coding]] was first used for [[speech coding]] compression, with [[linear predictive coding]] (LPC).<ref name="Schroeder2014">{{cite book |last1=Schroeder |first=Manfred R. |title=Acoustics, Information, and Communication: Memorial Volume in Honor of Manfred R. Schroeder |date=2014 |publisher=Springer |isbn=9783319056609 |chapter=Bell Laboratories |page=388 |chapter-url=https://books.google.com/books?id=d9IkBAAAQBAJ&pg=PA388}}</ref> Initial concepts for LPC date back to the work of [[Fumitada Itakura]] ([[Nagoya University]]) and Shuzo Saito ([[Nippon Telegraph and Telephone]]) in 1966.<ref>{{cite journal |last1=Gray |first1=Robert M. |title=A History of Realtime Digital Speech on Packet Networks: Part II of Linear Predictive Coding and the Internet Protocol |journal=Found. Trends Signal Process. |date=2010 |volume=3 |issue=4 |pages=203–303 |doi=10.1561/2000000036 |url=https://ee.stanford.edu/~gray/lpcip.pdf |issn=1932-8346}}</ref> During the 1970s, [[Bishnu S. Atal]] and [[Manfred R. Schroeder]] at [[Bell Labs]] developed a form of LPC called [[adaptive predictive coding]] (APC), a perceptual coding algorithm that exploited the masking properties of the human ear, followed in the early 1980s with the [[code-excited linear prediction]] (CELP) algorithm which achieved a significant [[compression ratio]] for its time.<ref name="Schroeder2014"/> Perceptual coding is used by modern audio compression formats such as [[MP3]]<ref name="Schroeder2014"/> and [[Advanced Audio Codec|AAC]].
 
[[Discrete cosine transform]] (DCT), developed by [[N. Ahmed|Nasir Ahmed]], T. Natarajan and [[K. R. Rao]] in 1974,<ref name="DCT">{{cite journal |author1=Nasir Ahmed |author1-link=N. Ahmed |author2=T. Natarajan |author3=Kamisetty Ramamohan Rao |journal=IEEE Transactions on Computers|title=Discrete Cosine Transform|volume=C-23|issue=1|pages=90–93|date=January 1974 |doi=10.1109/T-C.1974.223784 |url=https://www.ic.tu-berlin.de/fileadmin/fg121/Source-Coding_WS12/selected-readings/Ahmed_et_al.__1974.pdf}}</ref> provided the basis for the [[modified discrete cosine transform]] (MDCT) used by modern audio compression formats such as MP3<ref name="Guckert">{{cite web |last1=Guckert |first1=John |title=The Use of FFT and MDCT in MP3 Audio Compression |url=http://www.math.utah.edu/~gustafso/s2012/2270/web-projects/Guckert-audio-compression-svd-mdct-MP3.pdf |website=[[University of Utah]] |date=Spring 2012 |accessdate=14 July 2019}}</ref> and AAC. MDCT was proposed by J. P. Princen, A. W. Johnson and A. B. Bradley in 1987,<ref>J. P. Princen, A. W. Johnson und A. B. Bradley: ''Subband/transform coding using filter bank designs based on time ___domain aliasing cancellation'', IEEE Proc. Intl. Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2161–2164, 1987.</ref> following earlier work by Princen and Bradley in 1986.<ref>John P. Princen, Alan B. Bradley: ''Analysis/synthesis filter bank design based on time ___domain aliasing cancellation'', IEEE Trans. Acoust. Speech Signal Processing, ''ASSP-34'' (5), 1153–1161, 1986.</ref> The MDCT is used by modern audio compression formats such as [[Dolby Digital]],<ref name="Luo">{{cite book |last1=Luo |first1=Fa-Long |title=Mobile Multimedia Broadcasting Standards: Technology and Practice |date=2008 |publisher=[[Springer Science & Business Media]] |isbn=9780387782638 |page=590 |url=https://books.google.com/books?id=l6PovWat8SMC&pg=PA590}}</ref><ref>{{cite journal |last1=Britanak |first1=V. |title=On Properties, Relations, and Simplified Implementation of Filter Banks in the Dolby Digital (Plus) AC-3 Audio Coding Standards |journal=IEEE Transactions on Audio, Speech, and Language Processing |date=2011 |volume=19 |issue=5 |pages=1231–1241 |doi=10.1109/TASL.2010.2087755}}</ref> [[MP3]],<ref name="Guckert">{{cite web |last1=Guckert |first1=John |title=The Use of FFT and MDCT in MP3 Audio Compression |url=http://www.math.utah.edu/~gustafso/s2012/2270/web-projects/Guckert-audio-compression-svd-mdct-MP3.pdf |website=[[University of Utah]] |date=Spring 2012 |accessdate=14 July 2019}}</ref> and [[Advanced Audio Coding]] (AAC).<ref name=brandenburg>{{cite web|url=http://graphics.ethz.ch/teaching/mmcom12/slides/mp3_and_aac_brandenburg.pdf|title=MP3 and AAC Explained|last=Brandenburg|first=Karlheinz|year=1999|url-status=live|archiveurl=https://web.archive.org/web/20170213191747/https://graphics.ethz.ch/teaching/mmcom12/slides/mp3_and_aac_brandenburg.pdf|archivedate=2017-02-13}}</ref>
 
==List of lossy formats==
Line 42:
| 1991
| 58%
| <ref name="Luo">{{cite book |last1=Luo |first1=Fa-Long |title=Mobile Multimedia Broadcasting Standards: Technology and Practice |date=2008 |publisher=[[Springer Science & Business Media]] |isbn=9780387782638 |page=590 |url=https://books.google.com/books?id=l6PovWat8SMC&pg=PA590}}</ref><ref name="Britanak2011">{{cite journal |last1=Britanak |first1=V. |title=On Properties, Relations, and Simplified Implementation of Filter Banks in the Dolby Digital (Plus) AC-3 Audio Coding Standards |journal=IEEE Transactions on Audio, Speech, and Language Processing |date=2011 |volume=19 |issue=5 |pages=1231–1241 |doi=10.1109/TASL.2010.2087755}}</ref>
|-
| [[Adaptive Transform Acoustic Coding]]
Line 84:
| 2012
| 8%
| <ref>{{cite conference|last1=Valin|first1=Jean-Marc|last2=Maxwell|first2=Gregory|last3=Terriberry|first3=Timothy B.|last4=Vos|first4=Koen|date=October 2013|title=High-Quality, Low-Delay Music Coding in the Opus Codec|url=https://arxiv.org/pdf/1602.04845.pdf|conference=135th AES Convention|publisher=[[Audio Engineering Society]]|arxiv=1602.04845}}</ref>
|-
| [[LDAC (codec)|LDAC]]
Line 103:
| 1990
| 14%
| <ref>{{cite web |title=Digital Theater Systems Audio Formats |url=https://www.loc.gov/preservation/digital/formats/fdd/fdd000232.shtml |website=[[Library of Congress]] |accessdate=10 November 2019 |date=27 December 2011}}</ref><ref>{{cite book |last1=Spanias |first1=Andreas |last2=Painter |first2=Ted |last3=Atti |first3=Venkatraman |title=Audio Signal Processing and Coding |date=2006 |publisher=[[John Wiley & Sons]] |isbn=9780470041963 |page=338 |url=https://books.google.com/books?id=a1RULRErhOYC&pg=PA338}}</ref>
|-
| [[Master Quality Authenticated]]