Content deleted Content added
Citation bot (talk | contribs) Added bibcode. | Use this bot. Report bugs. | Suggested by Dominic3203 | Category:Fourier analysis | #UCB_Category 54/126 |
m →Smoothness and discontinuities: punct., fmt. |
||
(4 intermediate revisions by the same user not shown) | |||
Line 9:
== Definition ==
As a lapped transform, the MDCT is somewhat unusual compared to other Fourier-related transforms in that it has half as many outputs as inputs (instead of the same number). In particular, it is a [[linear function]] <math>F\colon \mathbf{R}^{2N} \to \mathbf{R}^N</math> (where '''R''' denotes the set of [[real number]]s). The 2''N'' real numbers ''x''<sub>0</sub>, ..., ''x''<sub>2''N''
: <math>X_k = \sum_{n=0}^{2N-1} x_n \cos
▲:<math>X_k = \sum_{n=0}^{2N-1} x_n \cos \left[\frac{\pi}{N} \left(n+\frac{1}{2}+\frac{N}{2}\right) \left(k+\frac{1}{2}\right) \right]</math>
▲(The normalization coefficient in front of this transform, here unity, is an arbitrary convention and differs between treatments. Only the product of the normalizations of the MDCT and the IMDCT, below, is constrained.)
=== Inverse transform ===
Line 19 ⟶ 18:
The inverse MDCT is known as the '''IMDCT'''. Because there are different numbers of inputs and outputs, at first glance it might seem that the MDCT should not be invertible. However, perfect invertibility is achieved by ''adding'' the overlapped IMDCTs of subsequent overlapping blocks, causing the errors to ''cancel'' and the original data to be retrieved; this technique is known as ''time-___domain aliasing cancellation'' ('''TDAC''').
The IMDCT transforms ''N'' real numbers ''X''<sub>0</sub>, ..., ''X''<sub>''N''
: <math>y_n = \frac{1}{N} \sum_{k=0}^{N-1} X_k \cos
▲:<math>y_n = \frac{1}{N} \sum_{k=0}^{N-1} X_k \cos \left[\frac{\pi}{N} \left(n+\frac{1}{2}+\frac{N}{2}\right) \left(k+\frac{1}{2}\right) \right]</math>
▲(Like for the [[Discrete_cosine_transform#DCT-IV|DCT-IV]], an orthogonal transform, the inverse has the same form as the forward transform.)
In the case of a windowed MDCT with the usual window normalization (see below), the normalization coefficient in front of the IMDCT should be multiplied by 2 (i.e., becoming 2/''N'').
Line 33 ⟶ 31:
== Window functions ==
[[file:MDCT_WF.png|thumb|upright=1.8|
In typical signal-compression applications, the transform properties are further improved by using a [[window function]] ''w''<sub>''n''</sub> (''n'' = 0, ..., 2''N''
The transform remains invertible (that is, TDAC works), for a symmetric window ''w''<sub>''n''</sub> = ''w''<sub>2''N''−1−''n''</sub>, as long as ''w'' satisfies the
▲:<math>w_n^2 + w_{n + N}^2 = 1</math>.
Various window functions are used. A window that produces a form known as a modulated lapped transform (MLT)<ref>H. S. Malvar, "Lapped Transforms for Efficient Transform/Subband Coding", ''IEEE Trans. on Acoustics, Speech, and Signal Processing'', vol. 38, no. 6, pp. 969–978 (Equation 22), June 1990.</ref><ref>H. S. Malvar, "Modulated QMF Filter Banks with Perfect Reconstruction", ''Electronics Letters'', vol. 26, no. 13, pp. 906–907 (Equation 13), June 1990.</ref> is given by▼
:<math>w_n = \sin \left[\frac{\pi}{2N} \left(n+\frac{1}{2}\right) \right]</math>▼
▲Various window functions are used. A window that produces a form known as a modulated lapped transform (MLT)<ref>H. S. Malvar, "Lapped Transforms for Efficient Transform/Subband Coding", ''IEEE Trans. on Acoustics, Speech, and Signal Processing'', vol.
and is used for MP3 and MPEG-2 AAC, and
: <math>w_n = \sin
▲:<math>w_n = \sin \left( \frac{\pi}{2} \sin^2 \left[\frac{\pi}{2N} \left(n+\frac{1}{2}\right) \right] \right)</math>
for Vorbis. AC-3 uses a [[Kaiser_window#Kaiser–Bessel-derived_(KBD)_window|Kaiser–Bessel derived (KBD) window]], and MPEG-4 AAC can also use a KBD window.
Note that windows applied to the MDCT are different from windows used for some other types of signal analysis, since they must fulfill the Princen–Bradley condition. One of the reasons for this difference is that MDCT windows are applied twice, for both the MDCT (analysis) and the IMDCT (synthesis).
== Relationship to DCT-IV and
As can be seen by inspection of the definitions, for
In order to define the precise relationship to the DCT-IV, one must realize that the DCT-IV corresponds to alternating even/odd boundary conditions: even at its left boundary (around ''n'' = −1/2), odd at its right boundary (around ''n'' = ''N'' − 1/2), and so on (instead of periodic boundaries as for a [[discrete Fourier transform|DFT]]). This follows from the identities
: <math>\cos\left[\frac{\pi}{N} \left(-n - 1 + \frac{1}{2}\right) \left(k + \frac{1}{2}\right)\right] = \cos\left[\frac{\pi}{N} \left(n + \frac{1}{2}\right) \left(k + \frac{1}{2}\right)\right]</math> and : <math>\cos\left[\frac{\pi}{N} \left(2N - n - 1 + \frac{1}{2}\right) \left(k + \frac{1}{2}\right)\right] = -\cos\left[\frac{\pi}{N} \left(n + \frac{1}{2}\right) \left(k + \frac{1}{2}\right)\right].</math> Thus, if its inputs are an array ''x'' of length ''N'', we can imagine extending this array to (''x'', −''x''<sub>''R''</sub>, −''x'', ''x''<sub>''R''</sub>, ...) and so on, where ''x''<sub>''R''</sub> denotes ''x'' in reverse order. Consider an MDCT with 2''N'' inputs and ''N'' outputs, where we divide the inputs into four blocks (''a'', ''b'', ''c'', ''d'') each of size ''N''/2. If we shift these to the right by ''N''/2 (from the +''N''/2 term in the MDCT definition), then (''b'', ''c'', ''d'') extend past the end of the ''N'' DCT-IV inputs, so we must "fold" them back according to the boundary conditions described above.
: Thus, the MDCT of 2''N'' inputs (''a'', ''b'', ''c'', ''d'') is ''exactly'' equivalent to a DCT-IV of the ''N'' inputs: (−''c''<sub>''R''</sub> − ''d'', ''a'' − ''b''<sub>''R''</sub>), where ''R'' denotes reversal as above.
(In this way, any algorithm to compute the DCT-IV can be trivially applied to the MDCT.)▼
Similarly, the IMDCT formula above is precisely 1/2 of the DCT-IV (which is its own inverse), where the output is extended (via the boundary conditions) to a length 2''N'' and shifted back to the left by ''N''/2. The inverse DCT-IV would simply give back the inputs (−''c''<sub>''R''</sub>−''d'', ''a''−''b''<sub>''R''</sub>) from above. When this is extended via the boundary conditions and shifted, one obtains:▼
:IMDCT (MDCT (''a'', ''b'', ''c'', ''d'')) = (''a''−''b''<sub>''R''</sub>, ''b''−''a''<sub>''R''</sub>, ''c''+''d''<sub>''R''</sub>, ''d''+''c''<sub>''R''</sub>) / 2.▼
▲Similarly, the IMDCT formula above is precisely 1/2 of the DCT-IV (which is its own inverse), where the output is extended (via the boundary conditions) to a length 2''N'' and shifted back to the left by ''N''/2. The inverse DCT-IV would simply give back the inputs (−''c''<sub>''R''</sub> − ''d'', ''a'' − ''b''<sub>''R''</sub>) from above. When this is extended via the boundary conditions and shifted, one obtains
▲: IMDCT
: IMDCT(MDCT(''A'', ''B'')) = (''A'' − ''A''<sub>''R''</sub>, ''B'' + ''B''<sub>''R''</sub>)/2.
One can now understand how TDAC works. Suppose that one computes the MDCT of the subsequent, 50% overlapped, 2''N'' block (''B'', ''C''). The IMDCT will then yield, analogous to the above: (''B'' − ''B''<sub>''R''</sub>, ''C'' + ''C''<sub>''R''</sub>)
=== Origin of TDAC ===
Line 80 ⟶ 77:
''a'' and of ''b''<sub>''R''</sub> to the MDCT of (''a'', ''b'', ''c'', ''d''), or equivalently, to
the result of
: IMDCT
The combinations ''c''
For
=== Smoothness and discontinuities ===
We have seen above that the MDCT of 2''N'' inputs (''a'', ''b'', ''c'', ''d'') is equivalent to a DCT-IV of the ''N'' inputs (−''c''<sub>''R''</sub> − ''d'', ''a'' − ''b''<sub>''R''</sub>).
The DCT-IV is designed for the case where the function at the right boundary is odd, and therefore the values near the right boundary are close to 0. If the input signal is smooth, this is the case: the rightmost components of ''a'' and ''b''<sub>''R''</sub> are consecutive in the input sequence (''a'', ''b'', ''c'', ''d''), and therefore their difference is small.
Let us look at the middle of the interval: if we rewrite the above expression as (−''c''<sub>''R''</sub> − ''d'', ''a'' − ''b''<sub>''R''</sub>) = (−''d'', ''a'') − (''b'', ''c'')<sub>''R''</sub>, the second term, (''b'', ''c'')<sub>''R''</sub>, gives a smooth transition in the middle.
However, in the first term, (−''d'', ''a''), there is a potential discontinuity where the right end of −''d'' meets the left end of ''a''.
This is the reason for using a window function that reduces the components near the boundaries of the input sequence (''a'', ''b'', ''c'', ''d'') towards 0.▼
▲This is the reason for using a window function that reduces the components
=== TDAC for the windowed MDCT ===
|