Modified discrete cosine transform: Difference between revisions

Content deleted Content added
Tags: Mobile edit Mobile web edit
Citation bot (talk | contribs)
Added bibcode. | Use this bot. Report bugs. | Suggested by Dominic3203 | Category:Fourier analysis | #UCB_Category 54/126
Line 3:
The '''modified discrete cosine transform''' ('''MDCT''') is a transform based on the type-IV [[discrete cosine transform]] (DCT-IV), with the additional property of being [[lapped transform|lapped]]: it is designed to be performed on consecutive blocks of a larger [[dataset]], where subsequent blocks are overlapped so that the last half of one block coincides with the first half of the next block. This overlapping, in addition to the energy-compaction qualities of the DCT, makes the MDCT especially attractive for signal compression applications, since it helps to avoid [[compression artifact|artifacts]] stemming from the block boundaries. As a result of these advantages, the MDCT is the most widely used [[lossy compression]] technique in [[audio data compression]]. It is employed in most modern [[audio coding standards]], including [[MP3]], [[Dolby Digital]] (AC-3), [[Vorbis]] (Ogg), [[Windows Media Audio]] (WMA), [[ATRAC]], [[Cook codec|Cook]], [[Advanced Audio Coding]] (AAC),<ref name="Luo">{{cite book |last1=Luo |first1=Fa-Long |title=Mobile Multimedia Broadcasting Standards: Technology and Practice |date=2008 |publisher=[[Springer Science & Business Media]] |isbn=9780387782638 |page=590 |url=https://books.google.com/books?id=l6PovWat8SMC&pg=PA590}}</ref> [[High-Definition Coding]] (HDC),<ref>{{cite book |last1=Jones |first1=Graham A. |last2=Layer |first2=David H. |last3=Osenkowsky |first3=Thomas G. |title=National Association of Broadcasters Engineering Handbook: NAB Engineering Handbook |date=2013 |publisher=[[Taylor & Francis]] |isbn=978-1-136-03410-7 |pages=558–9 |url=https://books.google.com/books?id=K9N1TVhf82YC&pg=PA558}}</ref> [[LDAC (codec)|LDAC]], [[Dolby AC-4]],<ref>{{cite web |title=Dolby AC-4: Audio Delivery for Next-Generation Entertainment Services |url=https://www.dolby.com/us/en/technologies/ac-4/Next-Generation-Entertainment-Services.pdf |website=[[Dolby Laboratories]] |date=June 2015 |access-date=11 November 2019}}</ref> and [[MPEG-H 3D Audio]],<ref>{{cite journal |last1=Bleidt |first1=R. L. |last2=Sen |first2=D. |last3=Niedermeier |first3=A. |last4=Czelhan |first4=B. |last5=Füg |first5=S. |display-authors=etal |title=Development of the MPEG-H TV Audio System for ATSC 3.0 |journal=IEEE Transactions on Broadcasting |date=2017 |volume=63 |issue=1 |pages=202–236 |doi=10.1109/TBC.2017.2661258 |s2cid=30821673 |url=https://www.iis.fraunhofer.de/content/dam/iis/en/doc/ame/Conference-Paper/BleidtR-IEEE-2017-Development-of-MPEG-H-TV-Audio-System-for-ATSC-3-0.pdf}}</ref> as well as [[speech coding]] standards such as [[AAC-LD]] (LD-MDCT),<ref>{{cite conference |last1=Schnell |first1=Markus |last2=Schmidt |first2=Markus |last3=Jander |first3=Manuel |last4=Albert |first4=Tobias |last5=Geiger |first5=Ralf |last6=Ruoppila |first6=Vesa |last7=Ekstrand |first7=Per |last8=Bernhard |first8=Grill |title=MPEG-4 Enhanced Low Delay AAC - A New Standard for High Quality Communication |conference=125th AES Convention |date=October 2008 |publisher=[[Audio Engineering Society]] |url=https://www.iis.fraunhofer.de/content/dam/iis/de/doc/ame/conference/AES-125-Convention_AAC-ELD-NewStandardForHighQualityCommunication_AES7503.pdf |website=[[Fraunhofer IIS]] |access-date=20 October 2019}}</ref> [[G.722.1]],<ref>{{cite conference |last1=Lutzky |first1=Manfred |last2=Schuller |first2=Gerald |last3=Gayer |first3=Marc |last4=Krämer |first4=Ulrich |last5=Wabnik |first5=Stefan |title=A guideline to audio codec delay |url=https://www.iis.fraunhofer.de/content/dam/iis/de/doc/ame/conference/AES-116-Convention_guideline-to-audio-codec-delay_AES116.pdf |website=[[Fraunhofer IIS]] |conference=116th AES Convention |publisher=[[Audio Engineering Society]] |date=May 2004 |access-date=24 October 2019}}</ref> [[G.729.1]],<ref name="Nagireddi">{{cite book |last1=Nagireddi |first1=Sivannarayana |title=VoIP Voice and Fax Signal Processing |date=2008 |publisher=[[John Wiley & Sons]] |isbn=9780470377864 |page=69 |url=https://books.google.com/books?id=5AneeZFE71MC&pg=PA69}}</ref> [[CELT]],<ref name="presentation">[http://people.xiph.org/~greg/video/linux_conf_au_CELT_2.ogv Presentation of the CELT codec] {{Webarchive|url=https://web.archive.org/web/20110807182250/http://people.xiph.org/~greg/video/linux_conf_au_CELT_2.ogv |date=2011-08-07 }} by Timothy B. Terriberry (65 minutes of video, see also [http://www.celt-codec.org/presentations/misc/lca-celt.pdf presentation slides] {{Webarchive|url=https://web.archive.org/web/20231116105544/http://www.celt-codec.org/presentations/misc/lca-celt.pdf |date=2023-11-16}} in PDF)</ref> and [[Opus (audio format)|Opus]].<ref name="homepage">{{cite web |url=http://opus-codec.org/ |title=Opus Codec |work=Opus |publisher=Xiph.org Foundation |type=Home page |access-date=July 31, 2012}}</ref><ref name="ars-role">{{cite web |url=https://arstechnica.com/gadgets/2012/09/newly-standardized-opus-audio-codec-fills-every-role-from-online-chat-to-music/ |title=Newly standardized Opus audio codec fills every role from online chat to music |first=Peter |last=Bright |work=[[Ars Technica]] |date=2012-09-12 |access-date=2014-05-28}}</ref>
 
The [[discrete cosine transform]] (DCT) was first proposed by [[N. Ahmed|Nasir Ahmed]] in 1972,<ref name="Ahmed">{{cite journal |last=Ahmed |first=Nasir |author-link=N. Ahmed |title=How I Came Up With the Discrete Cosine Transform |journal=[[Digital Signal Processing (journal)|Digital Signal Processing]] |date=January 1991 |volume=1 |issue=1 |pages=4–5 |doi=10.1016/1051-2004(91)90086-Z |bibcode=1991DSP.....1....4A |url=https://www.cse.iitd.ac.in/~pkalra/col783-2017/DCT-History.pdf}}</ref> and demonstrated by Ahmed with T. Natarajan and [[K. R. Rao]] in 1974.<ref name="pubDCT">{{Citation |first1=Nasir |last1=Ahmed |author1-link=N. Ahmed |first2=T. |last2=Natarajan |first3=K. R. |last3=Rao |title=Discrete Cosine Transform |journal=IEEE Transactions on Computers |date=January 1974 |volume=C-23 |issue=1 |pages=90–93 |doi=10.1109/T-C.1974.223784|s2cid=149806273 }}</ref> The MDCT was later proposed by John P. Princen, A.W. Johnson and Alan B. Bradley at the [[University of Surrey]] in 1987,<ref>{{cite book |last1=Princen |first1=John P. |last2=Johnson |first2=A.W. |last3=Bradley |first3=Alan B. |title=ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing |chapter=Subband/Transform coding using filter bank designs based on time ___domain aliasing cancellation |date=1987 |volume=12 |pages=2161–2164 |doi=10.1109/ICASSP.1987.1169405|s2cid=58446992 }}</ref> following earlier work by Princen and Bradley (1986)<ref>John P. Princen, Alan B. Bradley: ''Analysis/synthesis filter bank design based on time ___domain aliasing cancellation'', IEEE Trans. Acoust. Speech Signal Processing, ''ASSP-34'' (5), 1153–1161, 1986. Described a precursor to the MDCT using a combination of discrete cosine and sine transforms.</ref> to develop the MDCT's underlying principle of '''time-___domain aliasing cancellation''' (TDAC), described below. (There also exists an analogous transform, the MDST, based on the [[discrete sine transform]], as well as other, rarely used, forms of the MDCT based on different types of DCT or DCT/DST combinations.)
 
In MP3, the MDCT is not applied to the audio signal directly, but rather to the output of a 32-band [[polyphase quadrature filter]] (PQF) bank. The output of this MDCT is postprocessed by an alias reduction formula to reduce the typical aliasing of the PQF filter bank. Such a combination of a filter bank with an MDCT is called a ''hybrid'' filter bank or a ''subband'' MDCT. AAC, on the other hand, normally uses a pure MDCT; only the (rarely used) [[MPEG-4 AAC-SSR]] variant (by [[Sony]]) uses a four-band PQF bank followed by an MDCT. Similar to MP3, [[ATRAC]] uses stacked [[quadrature mirror filter]]s (QMF) followed by an MDCT.