Pre-echo

In audio signal processing, pre-echo, is a digital audio compression artifact where in the reconstructed signal (following an encoding/decoding cycle), a sound is heard before it occurs in the original signal (hence the name). It is most noticeable in sharp impulsive sounds (transients) from percussion instruments such as castanets or cymbals.^[1]

Cause

It occurs in transform-based audio compression algorithms – typically based on the modified discrete cosine transform (MDCT) – such as MP3, MPEG-4 AAC, and Vorbis, and is due to quantization noise being spread over the entire transform window of the codec.

Sharp impulsive sounds (named transcients), once transformed into a frequency representation (such as when transformed through an FFT or an MDCT) are leading to a broad set of frequency components.

Quantization errors over those components, once transformed back into the time ___domain (through an inverse FFT or MDCT), are resulting in temporal smearing of the quantization error. As a result, additionnal noise can appear, in the reconstructed sound, before its initial appearance in the original sound.

The original sound (time ___domain) is on the top row, showing a percussive sharp transcient sound from castanets.The reconstructed (encoded/decoded) signal underneath shows smearing of the quantisation error, resulting in audible pre-echo.

Audibility

The psychoacoustic component of the effect is that one hears only the echo preceding the transient, not the one following – because this latter is drowned out by the transient. Formally, forward temporal masking is much stronger than backwards temporal masking, hence one hears a pre-echo, but no post-echo.^[2]

Mitigation

In an effort to avoid pre-echo artifacts, many sound processing systems use filters where all of the response occurs after the main impulse, rather than linear-phase filters. Such filters necessarily introduce phase distortion and temporal smearing, but this additional distortion is less audible because of strong forward masking.

Avoiding pre-echo is a substantial design difficulty in transform ___domain lossy audio codecs such as MP3, MPEG-4 AAC, and Vorbis. It is also one of the problems encountered in digital room correction algorithms and frequency-___domain filters in general (denoising by spectral subtraction, equalization, and others). One way of reducing "breathing" for filters and compression techniques using time to frequency transforms is to temporarily use a set smaller transform window (short blocks in MP3), thus increasing the temporal resolution of the algorithm at the cost of reducing its frequency resolution.^[1]

The original sound (time ___domain) is on the top row, showing a percussive sharp transcient sound from castanets.Underneath, the reconstructed sound (when coded through the LAME mp3 encoder) exhibits significantly reduced quantisation artifacts when compared to the upper screenshot, due the use of smaller transform windows around the transcient. Transform windows limits are indicated by vertical yellow lines.

To better reproduce transient and eliminate pre-echo, lossy audio compression software such as open-source Vorbis encoder (oggenc from vorbis-tools), impulse noise tune or/and bit reservoir can be used as an advanced option.

References

^ ^a ^b Iwai, Kyle K (1994). "Pre-Echo Detection & Reduction" (PDF).
^ Zwicker, Eberhard; Fastl, Hugo (2010). Psychoacoustics: facts and models. Springer series in information sciences (3. ed., [repr.] ed.). Berlin Heidelberg: Springer. ISBN 978-3-540-23159-2.

External links

Pre-echo at Hydrogenaudio Knowledgebase

[:0-1] Iwai, Kyle K (1994). "Pre-Echo Detection & Reduction" (PDF).

[2] Zwicker, Eberhard; Fastl, Hugo (2010). Psychoacoustics: facts and models. Springer series in information sciences (3. ed., [repr.] ed.). Berlin Heidelberg: Springer. ISBN 978-3-540-23159-2.

[1]

[2]