Modified discrete cosine transform: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 20:57, 23 October 2019 edit Maestro2016 (talk \| contribs) Extended confirmed users 80,101 edits No edit summary ← Previous edit		Latest revision as of 10:12, 7 March 2025 edit undo Mikhail Ryazanov (talk \| contribs) Extended confirmed users 24,662 edits m →Smoothness and discontinuities: punct., fmt.
(30 intermediate revisions by 15 users not shown)
Line 1: {{short description\|Mathematical transform using in signal processing}} {{Redirect\|MDCT\|the form of [[medical imaging]] \|Multidetector computed tomography}} The '''modified discrete cosine transform''' ('''MDCT''') is a transform based on the type-IV [[discrete cosine transform]] (DCT-IV), with the additional property of being [[lapped transform\|lapped]]: it is designed to be performed on consecutive blocks of a larger [[dataset]], where subsequent blocks are overlapped so that the last half of one block coincides with the first half of the next block. This overlapping, in addition to the energy-compaction qualities of the DCT, makes the MDCT especially attractive for signal compression applications, since it helps to avoid [[compression artifact\|artifacts]] stemming from the block boundaries. As a result of these advantages, the MDCT is the most widely used [[lossy compression]] technique in [[audio data compression]]. It is employed in most modern [[audio coding standards]], including [[MP3]], [[Dolby Digital]] (AC-3), [[Vorbis]] (Ogg), [[Windows Media Audio]] (WMA), [[ATRAC]], [[Cook codec\|Cook]], [[Advanced ~~audio~~Audio ~~coding\|AAC~~Coding]] (AAC),<ref name="Luo">{{cite book \|last1=Luo \|first1=Fa-Long \|title=Mobile Multimedia Broadcasting Standards: Technology and Practice \|date=2008 \|publisher=[[~~MP4~~Springer Science & Business Media]] ~~Audio~~\|isbn=9780387782638 \|page=590 \|url=https://books.google.com/books?id=l6PovWat8SMC&pg=PA590}}</ref> [[High-Definition Coding]] (HDC),<ref>{{cite ~~and~~book \|last1=Jones \|first1=Graham A. \|last2=Layer \|first2=David H. \|last3=Osenkowsky \|first3=Thomas G. \|title=National Association of Broadcasters Engineering Handbook: NAB Engineering Handbook \|date=2013 \|publisher=[[Taylor & Francis]] \|isbn=978-1-136-03410-7 \|pages=558–9 \|url=https://books.google.com/books?id=K9N1TVhf82YC&pg=PA558}}</ref> [[LDAC (codec)\|LDAC]], [[Dolby AC-4]],<ref>{{cite web \|title=Dolby AC-4: Audio Delivery for Next-Generation Entertainment Services \|url=https://www.dolby.com/us/en/technologies/ac-4/Next-Generation-Entertainment-Services.pdf \|website=[[Dolby Laboratories]] \|date=June 2015 \|access-date=11 November 2019}}</ref> and [[MPEG-H 3D Audio]],<ref>{{cite journal \|last1=Bleidt \|first1=R. L. \|last2=Sen \|first2=D. \|last3=Niedermeier \|first3=A. \|last4=Czelhan \|first4=B. \|last5=Füg \|first5=S. \|display-authors=etal \|title=Development of the MPEG-H TV Audio System for ATSC 3.0 \|journal=IEEE Transactions on Broadcasting \|date=2017 \|volume=63 \|issue=1 \|pages=202–236 \|doi=10.1109/TBC.2017.2661258 \|s2cid=30821673 \|url=https://www.iis.fraunhofer.de/content/dam/iis/en/doc/ame/Conference-Paper/BleidtR-IEEE-2017-Development-of-MPEG-H-TV-Audio-System-for-ATSC-3-0.pdf}}</ref> as well as [[speech coding]] standards such as [[AAC-LD]] (LD-MDCT),<ref>{{cite conference \|last1=Schnell \|first1=Markus \|last2=Schmidt \|first2=Markus \|last3=Jander \|first3=Manuel \|last4=Albert \|first4=Tobias \|last5=Geiger \|first5=Ralf \|last6=Ruoppila \|first6=Vesa \|last7=Ekstrand \|first7=Per \|last8=Bernhard \|first8=Grill \|title=MPEG-4 Enhanced Low Delay AAC - A New Standard for High Quality Communication \|conference=125th AES Convention \|date=October 2008 \|publisher=[[Audio Engineering Society]] \|url=https://www.iis.fraunhofer.de/content/dam/iis/de/doc/ame/conference/AES-125-Convention_AAC-ELD-NewStandardForHighQualityCommunication_AES7503.pdf \|website=[[Fraunhofer IIS]] \|~~accessdate~~access-date=20 October 2019}}</ref> [[G.722.1]],<ref>{{cite conference \|last1=Lutzky \|first1=Manfred \|last2=Schuller \|first2=Gerald \|last3=Gayer \|first3=Marc \|last4=Krämer \|first4=Ulrich \|last5=Wabnik \|first5=Stefan \|title=A guideline to audio codec delay \|url=https://www.iis.fraunhofer.de/content/dam/iis/de/doc/ame/conference/AES-116-Convention_guideline-to-audio-codec-delay_AES116.pdf \|website=[[Fraunhofer IIS]] \|conference=116th AES Convention \|publisher=[[Audio Engineering Society]] \|date=May 2004 \|access-date=24 October 2019}}</ref> [[G.729.1]],<ref name="Nagireddi">{{cite book \|last1=Nagireddi \|first1=Sivannarayana \|title=VoIP Voice and Fax Signal Processing \|date=2008 \|publisher=[[John Wiley & Sons]] \|isbn=9780470377864 \|page=69 \|url=https://books.google.com/books?id=5AneeZFE71MC&pg=PA69}}</ref> [[CELT]],<ref name="presentation">[http://people.xiph.org/~greg/video/linux_conf_au_CELT_2.ogv Presentation of the CELT codec] {{Webarchive\|url=https://web.archive.org/web/20110807182250/http://people.xiph.org/~greg/video/linux_conf_au_CELT_2.ogv \|date=2011-08-07 }} by Timothy B. Terriberry (65 minutes of video, see also [http://www.celt-codec.org/presentations/misc/lca-celt.pdf presentation slides] {{Webarchive\|url=https://web.archive.org/web/20231116105544/http://www.celt-codec.org/presentations/misc/lca-celt.pdf \|date=2023-11-16}} in PDF)</ref> and [[Opus (audio format)\|Opus]].<ref name="homepage">{{cite web \|url=http://opus-codec.org/ \|title=Opus Codec \|work=Opus \|publisher=Xiph.org Foundation \|type=Home page \|~~accessdate~~access-date=July 31, 2012}}</ref><ref name="ars-role">{{cite web \|url=https://arstechnica.com/gadgets/2012/09/newly-standardized-opus-audio-codec-fills-every-role-from-online-chat-to-music/ \|title=Newly standardized Opus audio codec fills every role from online chat to music \|first=Peter \|last=Bright \|work=[[Ars Technica]] \|date=2012-09-12 \|~~accessdate~~access-date=2014-05-28}}</ref>▼ The '''modified discrete cosine transform (MDCT)''' is a [[lapped transform]] based on the type-IV [[discrete cosine transform]] (DCT-IV), with the additional property of being ''lapped'': it is designed to be performed on consecutive blocks of a larger [[dataset]], ~~where subsequent blocks are overlapped so that~~ ~~the last half of one block coincides with the first half of the next block.~~ ▲This overlapping, in addition to the energy-compaction qualities of the DCT, makes the MDCT especially attractive for signal compression applications, since it helps to avoid [[compression artifact\|artifacts]] stemming from the block boundaries. As a result of these advantages, the MDCT is the most widely used [[lossy compression]] technique in [[audio data compression]]. It is employed in most modern [[audio coding standards]], including [[MP3]], [[Dolby Digital]] (AC-3), [[Vorbis]] (Ogg), [[Windows Media Audio]] (WMA), [[ATRAC]], [[Cook codec\|Cook]], [[Advanced audio coding\|AAC]] ([[MP4]] Audio), and [[LDAC (codec)\|LDAC]], as well as [[speech coding]] standards such as [[AAC-LD]] (LD-MDCT),<ref>{{cite conference \|last1=Schnell \|first1=Markus \|last2=Schmidt \|first2=Markus \|last3=Jander \|first3=Manuel \|last4=Albert \|first4=Tobias \|last5=Geiger \|first5=Ralf \|last6=Ruoppila \|first6=Vesa \|last7=Ekstrand \|first7=Per \|last8=Bernhard \|first8=Grill \|title=MPEG-4 Enhanced Low Delay AAC - A New Standard for High Quality Communication \|conference=125th AES Convention \|date=October 2008 \|publisher=[[Audio Engineering Society]] \|url=https://www.iis.fraunhofer.de/content/dam/iis/de/doc/ame/conference/AES-125-Convention_AAC-ELD-NewStandardForHighQualityCommunication_AES7503.pdf \|website=[[Fraunhofer IIS]] \|accessdate=20 October 2019}}</ref> [[G.729.1]],<ref name="Nagireddi">{{cite book \|last1=Nagireddi \|first1=Sivannarayana \|title=VoIP Voice and Fax Signal Processing \|date=2008 \|publisher=[[John Wiley & Sons]] \|isbn=9780470377864 \|page=69 \|url=https://books.google.com/books?id=5AneeZFE71MC&pg=PA69}}</ref> [[CELT]],<ref name="presentation">[http://people.xiph.org/~greg/video/linux_conf_au_CELT_2.ogv Presentation of the CELT codec] by Timothy B. Terriberry (65 minutes of video, see also [http://www.celt-codec.org/presentations/misc/lca-celt.pdf presentation slides] in PDF)</ref> and [[Opus (audio format)\|Opus]].<ref name="homepage">{{cite web \|url=http://opus-codec.org/ \|title=Opus Codec \|work=Opus \|publisher=Xiph.org Foundation \|type=Home page \|accessdate=July 31, 2012}}</ref><ref name="ars-role">{{cite web \|url=https://arstechnica.com/gadgets/2012/09/newly-standardized-opus-audio-codec-fills-every-role-from-online-chat-to-music/ \|title=Newly standardized Opus audio codec fills every role from online chat to music \|first=Peter \|last=Bright \|work=[[Ars Technica]] \|date=2012-09-12 \|accessdate=2014-05-28}}</ref> The [[discrete cosine transform]] (DCT) was first proposed by [[N. Ahmed\|Nasir Ahmed]] in 1972,<ref name="Ahmed">{{cite journal \|last=Ahmed \|first=Nasir \|author-link=N. Ahmed \|title=How I Came Up With the Discrete Cosine Transform \|journal=[[Digital Signal Processing (journal)\|Digital Signal Processing]] \|date=January 1991 \|volume=1 \|issue=1 \|pages=4–5 \|doi=10.1016/1051-2004(91)90086-Z \|bibcode=1991DSP.....1....4A \|url=https://www.~~scribd~~cse.~~com~~iitd.ac.in/~~doc~~~pkalra/~~52879771~~col783-2017/DCT-History~~-How-I-Came-Up-with-the-Discrete-Cosine-Transform~~.pdf}}</ref> and demonstrated by Ahmed with T. Natarajan and [[K. R. Rao]] in 1974.<ref name="pubDCT">{{Citation \|~~first~~first1=Nasir \|~~last~~last1=Ahmed \|author1-link=N. Ahmed \|first2=T. \|last2=Natarajan \|first3=K. R. \|last3=Rao \|title=Discrete Cosine Transform \|journal=IEEE Transactions on Computers \|date=January 1974 \|volume=C-23 \|issue=1 \|pages=90–93 \|doi=10.1109/T-C.1974.223784\|s2cid=149806273 }}</ref> The MDCT was later proposed by John P. Princen, A.W. Johnson and Alan B. Bradley at the [[University of Surrey]] in 1987,<ref>{{cite ~~journal~~book \|last1=Princen \|first1=John P. \|last2=Johnson \|first2=A.W. \|last3=Bradley \|first3=Alan B. \|title~~=Subband/Transform coding using filter bank designs based on time ___domain aliasing cancellation \|journal~~=ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing \|chapter=Subband/Transform coding using filter bank designs based on time ___domain aliasing cancellation \|date=1987 \|volume=12 \|pages=2161–2164 \|doi=10.1109/ICASSP.1987.1169405\|s2cid=58446992 }}</ref> following earlier work by Princen and Bradley (1986)<ref>John P. Princen, Alan B. Bradley: ''Analysis/synthesis filter bank design based on time ___domain aliasing cancellation'', IEEE Trans. Acoust. Speech Signal Processing, ''ASSP-34'' (5), 1153–1161, 1986. Described a precursor to the MDCT using a combination of discrete cosine and sine transforms.</ref> to develop the MDCT's underlying principle of '''time-___domain aliasing cancellation''' (TDAC), described below. (There also exists an analogous transform, the MDST, based on the [[discrete sine transform]], as well as other, rarely used, forms of the MDCT based on different types of DCT or DCT/DST combinations.) In MP3, the MDCT is not applied to the audio signal directly, but rather to the output of a 32-band [[polyphase quadrature filter]] (PQF) bank. The output of this MDCT is postprocessed by an alias reduction formula to reduce the typical aliasing of the PQF filter bank. Such a combination of a filter bank with an MDCT is called a ''hybrid'' filter bank or a ''subband'' MDCT. AAC, on the other hand, normally uses a pure MDCT; only the (rarely used) [[MPEG-4 AAC-SSR]] variant (by [[Sony]]) uses a four-band PQF bank followed by an MDCT. Similar to MP3, [[ATRAC]] uses stacked [[quadrature mirror filter]]s (QMF) followed by an MDCT. Line 12 ⟶ 9: == Definition == As a lapped transform, the MDCT is ~~a bit~~somewhat unusual compared to other Fourier-related transforms in that it has half as many outputs as inputs (instead of the same number). In particular, it is a [[linear function]] <math>F\colon \mathbf{R}^{2N} \to \mathbf{R}^N</math> (where '''R''' denotes the set of [[real number]]s). The 2''N'' real numbers ''x''<sub>0</sub>, ..., ''x''<sub>2''N''-1−1</sub> are transformed into the ''N'' real numbers ''X''<sub>0</sub>, ..., ''X''<sub>''N''-1−1</sub> according to the formula: : <math>X_k = \sum_{n=0}^{2N-1} x_n \cos \left[\frac{\pi}{N} \left(n + \frac{1}{2} + \frac{N}{2}\right) \left(k + \frac{1}{2}\right) \right].</math>▼ (The normalization coefficient in front of this transform, here unity, is an arbitrary convention and differs between treatments. Only the product of the normalizations of the MDCT and the IMDCT, below, is constrained.)▼ ▲:<math>X_k = \sum_{n=0}^{2N-1} x_n \cos \left[\frac{\pi}{N} \left(n+\frac{1}{2}+\frac{N}{2}\right) \left(k+\frac{1}{2}\right) \right]</math> ▲(The normalization coefficient in front of this transform, here unity, is an arbitrary convention and differs between treatments. Only the product of the normalizations of the MDCT and the IMDCT, below, is constrained.) === Inverse transform === Line 22 ⟶ 18: The inverse MDCT is known as the '''IMDCT'''. Because there are different numbers of inputs and outputs, at first glance it might seem that the MDCT should not be invertible. However, perfect invertibility is achieved by ''adding'' the overlapped IMDCTs of subsequent overlapping blocks, causing the errors to ''cancel'' and the original data to be retrieved; this technique is known as ''time-___domain aliasing cancellation'' ('''TDAC'''). The IMDCT transforms ''N'' real numbers ''X''<sub>0</sub>, ..., ''X''<sub>''N''-1−1</sub> into 2''N'' real numbers ''y''<sub>0</sub>, ..., ''y''<sub>2''N''-1−1</sub> according to the formula: : <math>y_n = \frac{1}{N} \sum_{k=0}^{N-1} X_k \cos \left[\frac{\pi}{N} \left(n + \frac{1}{2} + \frac{N}{2}\right) \left(k + \frac{1}{2}\right) \right].</math>▼ (Like for the [[Discrete_cosine_transform#DCT-IV\|DCT-IV]], an orthogonal transform, the inverse has the same form as the forward transform.)▼ ▲:<math>y_n = \frac{1}{N} \sum_{k=0}^{N-1} X_k \cos \left[\frac{\pi}{N} \left(n+\frac{1}{2}+\frac{N}{2}\right) \left(k+\frac{1}{2}\right) \right]</math> ▲(Like for the [[Discrete_cosine_transform#DCT-IV\|DCT-IV]], an orthogonal transform, the inverse has the same form as the forward transform.) In the case of a windowed MDCT with the usual window normalization (see below), the normalization coefficient in front of the IMDCT should be multiplied by 2 (i.e., becoming 2/''N''). Line 36 ⟶ 31: == Window functions == [[file:MDCT_WF.png\|thumb\|upright=1.8\|~~'''~~MDCT window functions:~~'''<br>~~ blue:{{snd}} ~~Cosine~~cosine, red:{{snd}} ~~Sine-Cosine~~sine–cosine, green:{{snd}} modified ~~Kaiser-Bessel~~Kaiser–Bessel]] In typical signal-compression applications, the transform properties are further improved by using a [[window function]] ''w''<sub>''n''</sub> (''n'' = 0, ..., 2''N''−1 − 1) that is multiplied with ''x''<sub>''n''</sub> in the MDCT and with ''y''<sub>''n''</sub> in the ~~MDCT and~~ IMDCT formulas, above, in order to avoid discontinuities at the ''n'' = 0 and 2''N'' boundaries by making the function go smoothly to zero at those points. (That is, wethe window function is applied to the data ''before'' the MDCT ~~and~~or ''after'' the IMDCT.) In principle, ''x'' and ''y'' could have different window functions, and the window function could also change from one block to the next (especially for the case where data blocks of different sizes are combined), but for simplicity we consider the common case of identical window functions for equal-sized blocks. The transform remains invertible (that is, TDAC works), for a symmetric window ''w''<sub>''n''</sub> = ''w''<sub>2''N''−1−''n''</sub>, as long as ''w'' satisfies the ~~[[Princen-Bradley~~Princen–Bradley condition]]: : <math>w_n^2 + w_{n + N}^2 = 1.</math>.▼ ▲:<math>w_n^2 + w_{n + N}^2 = 1</math>. Various window functions are used. A window that produces a form known as a modulated lapped transform<ref>H. S. Malvar, "Lapped Transforms for Efficient Transform/Subband Coding", ''IEEE Trans. on Acoustics, Speech, and Signal Processing'', vol. 38, no. 6, pp. 969–978 (Equation 22), June 1990.</ref><ref>H. S. Malvar, "Modulated QMF Filter Banks with Perfect Reconstruction", ''Electronics Letters'', vol. 26, no. 13, pp. 906–907 (Equation 13), June 1990.</ref> is given by▼ :<math>w_n = \sin \left[\frac{\pi}{2N} \left(n+\frac{1}{2}\right) \right]</math>▼ ▲Various window functions are used. A window that produces a form known as a modulated lapped transform (MLT)<ref>H. S. Malvar, "Lapped Transforms for Efficient Transform/Subband Coding", ''IEEE Trans. on Acoustics, Speech, and Signal Processing'', vol.  38, no.  6, pp.  969–978 (Equation  22), June 1990.</ref><ref>H. S. Malvar, "Modulated QMF Filter Banks with Perfect Reconstruction", ''Electronics Letters'', vol.  26, no.  13, pp.  906–907 (Equation  13), June 1990.</ref> is given by ▲: <math>w_n = \sin \left[\frac{\pi}{2N} \left(n + \frac{1}{2}\right) \right]</math> and is used for MP3 and MPEG-2 AAC, and : <math>w_n = \sin \left( \frac{\pi}{2} \sin^2 \left[\frac{\pi}{2N} \left(n + \frac{1}{2}\right) \right] \right)</math>▼ for Vorbis. AC-3 uses a [[~~Kaiser~~Kaiser_window#Kaiser–Bessel-~~Bessel~~derived_(KBD)_window\|Kaiser–Bessel derived (KBD) window]], and MPEG-4 AAC can also use a KBD window.▼ Note that windows applied to the MDCT are different from windows used for some other types of signal analysis, since they must fulfill the ~~Princen-Bradley~~Princen–Bradley condition. One of the reasons for this difference is that MDCT windows are applied twice, for both the MDCT (analysis) and the IMDCT (synthesis).▼ ▲:<math>w_n = \sin \left( \frac{\pi}{2} \sin^2 \left[\frac{\pi}{2N} \left(n+\frac{1}{2}\right) \right] \right)</math> == Relationship to DCT-IV and ~~Origin~~origin of TDAC ==▼ ▲for Vorbis. AC-3 uses a [[Kaiser-Bessel derived (KBD) window]], and MPEG-4 AAC can also use a KBD window. As can be seen by inspection of the definitions, for '''even''' ''N'' the MDCT is essentially equivalent to a DCT-IV, where the input is shifted by ''N''/2 and two ''N''-blocks of data are transformed at once. By examining this equivalence more carefully, important properties like TDAC can be easily derived.▼ ▲Note that windows applied to the MDCT are different from windows used for some other types of signal analysis, since they must fulfill the Princen-Bradley condition. One of the reasons for this difference is that MDCT windows are applied twice, for both the MDCT (analysis) and the IMDCT (synthesis). In order to define the precise relationship to the DCT-IV, one must realize that the DCT-IV corresponds to alternating even/odd boundary conditions: even at its left boundary (around ''n'' = −1/2), odd at its right boundary (around ''n'' = ''N'' − 1/2), and so on (instead of periodic boundaries as for a [[discrete Fourier transform\|DFT]]). This follows from the identities ▲== Relationship to DCT-IV and Origin of TDAC == : <math>\cos\left[\frac{\pi}{N} \left(-n - 1 + \frac{1}{2}\right) \left(k + \frac{1}{2}\right)\right] = \cos\left[\frac{\pi}{N} \left(n + \frac{1}{2}\right) \left(k + \frac{1}{2}\right)\right]</math> ▲As can be seen by inspection of the definitions, for '''even''' ''N'' the MDCT is essentially equivalent to a DCT-IV, where the input is shifted by ''N''/2 and two ''N''-blocks of data are transformed at once. By examining this equivalence more carefully, important properties like TDAC can be easily derived. and : <math>\cos\left[\frac{\pi}{N} \left(2N - n - 1 + \frac{1}{2}\right) \left(k + \frac{1}{2}\right)\right] = In order to define the precise relationship to the DCT-IV, one must realize that the DCT-IV corresponds to alternating even/odd boundary conditions: even at its left boundary (around ''n''=−1/2), odd at its right boundary (around ''n''=''N''−1/2), and so on (instead of periodic boundaries as for a [[discrete Fourier transform\|DFT]]). This follows from the identities <math>\cos\left[\frac{\pi}{N} \left(-n-1+\frac{1}{2}\right) \left(k+\frac{1}{2}\right)\right] = \cos\left[\frac{\pi}{N} \left(n+\frac{1}{2}\right) \left(k+\frac{1}{2}\right)\right]</math> and <math>\cos\left[\frac{\pi}{N} \left(2N-n-1+\frac{1}{2}\right) \left(k+\frac{1}{2}\right)\right] = -\cos\left[\frac{\pi}{N} \left(n+\frac{1}{2}\right) \left(k+\frac{1}{2}\right)\right]</math>. Thus, if its inputs are an array ''x'' of length ''N'', we can imagine extending this array to (''x'', −''x''<sub>''R''</sub>, −''x'', ''x''<sub>''R''</sub>, ...) and so on, where ''x''<sub>''R''</sub> denotes ''x'' in reverse order. -\cos\left[\frac{\pi}{N} \left(n + \frac{1}{2}\right) \left(k + \frac{1}{2}\right)\right].</math> Thus, if its inputs are an array ''x'' of length ''N'', we can imagine extending this array to (''x'', −''x''<sub>''R''</sub>, −''x'', ''x''<sub>''R''</sub>, ...) and so on, where ''x''<sub>''R''</sub> denotes ''x'' in reverse order. Consider an MDCT with 2''N'' inputs and ''N'' outputs, where we divide the inputs into four blocks (''a'', ''b'', ''c'', ''d'') each of size ''N''/2. If we shift these to the right by ''N''/2 (from the +''N''/2 term in the MDCT definition), then (''b'', ''c'', ''d'') extend past the end of the ''N'' DCT-IV inputs, so we must "fold" them back according to the boundary conditions described above. : Thus, the MDCT of 2''N'' inputs (''a'', ''b'', ''c'', ''d'') is ''exactly'' equivalent to a DCT-IV of the ''N'' inputs: (−''c''<sub>''R''</sub> − ''d'', ''a'' − ''b''<sub>''R''</sub>), where ''R'' denotes reversal as above. (In this way, any algorithm to compute the DCT-IV can be trivially applied to the MDCT.)▼ Similarly, the IMDCT formula above is precisely 1/2 of the DCT-IV (which is its own inverse), where the output is extended (via the boundary conditions) to a length 2''N'' and shifted back to the left by ''N''/2. The inverse DCT-IV would simply give back the inputs (−''c''<sub>''R''</sub>−''d'', ''a''−''b''<sub>''R''</sub>) from above. When this is extended via the boundary conditions and shifted, one obtains:▼ ▲(In this way, any algorithm to compute the DCT-IV can be trivially applied to the MDCT.) :IMDCT(MDCT(''a'', ''b'', ''c'', ''d'')) = (''a''−''b''<sub>''R''</sub>, ''b''−''a''<sub>''R''</sub>, ''c''+''d''<sub>''R''</sub>, ''d''+''c''<sub>''R''</sub>) / 2.▼ ▲Similarly, the IMDCT formula above is precisely 1/2 of the DCT-IV (which is its own inverse), where the output is extended (via the boundary conditions) to a length 2''N'' and shifted back to the left by ''N''/2. The inverse DCT-IV would simply give back the inputs (−''c''<sub>''R''</sub> − ''d'', ''a'' − ''b''<sub>''R''</sub>) from above. When this is extended via the boundary conditions and shifted, one obtains: Half of the IMDCT outputs are thus redundant, as ''b''−''a''<sub>''R''</sub> = −(''a''−''b''<sub>''R''</sub>)<sub>''R''</sub>, and likewise for the last two terms. If we group the input into bigger blocks ''A'',''B'' of size ''N'', where ''A''=(''a'', ''b'') and ''B''=(''c'', ''d''), we can write this result in a simpler way:▼ ▲: IMDCT(MDCT(''a'', ''b'', ''c'', ''d'')) = (''a''~~−~~ − ''b''<sub>''R''</sub>, ''b''~~−~~ − ''a''<sub>''R''</sub>, ''c'' + ''d''<sub>''R''</sub>, ''d'' + ''c''<sub>''R''</sub>) / 2. ▲Half of the IMDCT outputs are thus redundant, as ''b'' − ''a''<sub>''R''</sub> = −(''a'' − ''b''<sub>''R''</sub>)<sub>''R''</sub>, and likewise for the last two terms. If we group the input into bigger blocks ''A'',''B'' of size ''N'', where ''A'' = (''a'', ''b'') and ''B'' = (''c'', ''d''), we can write this result in a simpler way: : IMDCT(MDCT(''A'', ''B'')) = (''A''~~−~~ − ''A''<sub>''R''</sub>, ''B'' + ''B''<sub>''R''</sub>) / 2. One can now understand how TDAC works. Suppose that one computes the MDCT of the subsequent, 50% overlapped, 2''N'' block (''B'', ''C''). The IMDCT will then yield, analogous to the above: (''B'' − ''B''<sub>''R''</sub>, ''C'' + ''C''<sub>''R''</sub>) / 2. When this is added with the previous IMDCT result in the overlapping half, the reversed terms cancel and one obtains simply ''B'', recovering the original data. === Origin of TDAC === Line 83 ⟶ 77: ''a'' and of ''b''<sub>''R''</sub> to the MDCT of (''a'', ''b'', ''c'', ''d''), or equivalently, to the result of : IMDCT(MDCT(''a'', ''b'', ''c'', ''d'')) = (''a''~~−~~ − ''b''<sub>''R''</sub>, ''b''~~−~~ − ''a''<sub>''R''</sub>, ''c'' + ''d''<sub>''R''</sub>, ''d'' + ''c''<sub>''R''</sub>) / 2. The combinations ''c''~~−~~ − ''d''<sub>''R''</sub> and so on, have precisely the right signs for the combinations to cancel when they are added. For '''odd''' ''N'' (which are rarely used in practice), ''N''/2 is not an integer, so the MDCT is not simply a shift permutation of a DCT-IV. In this case, the additional shift by half a sample means that the MDCT/IMDCT becomes equivalent to the DCT-III/II, and the analysis is analogous to the above. === Smoothness and discontinuities === We have seen above that the MDCT of 2''N'' inputs (''a'', ''b'', ''c'', ''d'') is equivalent to a DCT-IV of the ''N'' inputs (−''c''<sub>''R''</sub> − ''d'', ''a'' − ''b''<sub>''R''</sub>). The DCT-IV is designed for the case where the function at the right boundary is odd, and therefore the values near the right boundary are close to 0. If the input signal is smooth, this is the case: the rightmost components of ''a'' and ''b''<sub>''R''</sub> are consecutive in the input sequence (''a'', ''b'', ''c'', ''d''), and therefore their difference is small. ~~''c'', ''d'') is equivalent to a DCT-IV of the ''N'' inputs~~ Let us look at the middle of the interval: if we rewrite the above expression as (−''c''<sub>''R''</sub> − ''d'', ''a'' − ''b''<sub>''R''</sub>) = (−''d'', ''a'') − (''b'', ''c'')<sub>''R''</sub>, the second term, (''b'', ''c'')<sub>''R''</sub>, gives a smooth transition in the middle. ~~(−''c''<sub>''R''</sub>−''d'',~~ However, in the first term, (−''d'', ''a''), there is a potential discontinuity where the right end of −''d'' meets the left end of ''a''. ~~''a''−''b''<sub>''R''</sub>).~~ This is the reason for using a window function that reduces the components near the boundaries of the input sequence (''a'', ''b'', ''c'', ''d'') towards 0.▼ ~~The DCT-IV is designed for the~~ ~~case where the function at the right boundary is odd, and therefore~~ ~~the values near the right boundary are close to 0. If the input signal is smooth,~~ ~~this is the case: the rightmost components of ''a'' and ''b''<sub>''R''</sub> are~~ ~~consecutive in the input sequence (''a'', ''b'', ''c'', ''d''), and~~ ~~therefore their difference is small.~~ ~~Let us look at the middle of the interval:~~ ~~if we rewrite the above expression as~~ ~~(−''c''<sub>''R''</sub>−''d'',~~ ~~''a''−''b''<sub>''R''</sub>) = (−''d'', ''a'')−(''b'',''c'')<sub>''R''</sub>,~~ ~~the second term, (''b'',''c'')<sub>''R''</sub>, gives a smooth~~ ~~transition in the middle.~~ ~~However, in the first term, (−''d'', ''a''), there is a~~ ~~potential discontinuity where the right end of~~ ~~−''d'' meets the left end of ''a''.~~ ▲This is the reason for using a window function that reduces the components ~~near the boundaries of the input sequence (''a'', ''b'',~~ ~~''c'', ''d'') towards 0.~~ === TDAC for the windowed MDCT === Line 117 ⟶ 94: Above, the TDAC property was proved for the ordinary MDCT, showing that adding IMDCTs of subsequent blocks in their overlapping half recovers the original data. The derivation of this inverse property for the windowed MDCT is only slightly more complicated. Consider totwo overlapping consecutive sets of 2''N'' inputs (''A'',''B'') and (''B'',''C''), for blocks ''A'',''B'',''C'' of size ''N''. Recall from above that when <math>(A,B)</math> and <math>(B,C)</math> are MDCTed, IMDCTed, and added in their overlapping half, we obtain <math>(B+B_R) / 2 + (B-B_R) / 2 = B</math>, the original data. Line 162 ⟶ 139: [[Category:Fourier analysis]] [[Category:Discrete transforms]] [[Category:Data compression]]