Overlap–save method: Difference between revisions

Content deleted Content added
add a figure and more discussion of efficiency
mention rotating the h[n] buffer
Line 1:
{{hatnote|This article uses common abstract notations, such as <math display="inline">y(t) = x(t) * h(t),</math> or <math display="inline">y(t) = \mathcal{H}\{x(t)\},</math> in which it is understood that the functions should be thought of in their totality, rather than at specific instants <math display="inline">t.</math> (see [[Convolution#Notation]])}}
'''Overlap–save''' is the traditional name for an efficient way to evaluate the [[Convolution#Discrete convolution|discrete convolution]] between a very long signal <math>x[n]</math> and a [[finite impulse response]] (FIR) filter <math>h[n]</math>''':'''
 
{{NumBlk|:|<math>y[n] = x[n] * h[n] \ \triangleq \ \sum_{m=-\infty}^{\infty} h[m] \cdot x[n-m] = \sum_{m=1}^{M} h[m] \cdot x[n-m],\,</math>|{{EquationRef|Eq.1}}}}
 
where h[m]=0 for m outside the region [1, ''M''].
 
In [[signal processing]], '''''Overlap–save''''' is the traditional name for an efficient way to evaluate the [[Convolution#Discrete convolution|discrete convolution]] between a very long signal <math>x[n]</math> and a [[finite impulse response]] (FIR) filter <math>h[n]</math>''':'''
[[Image:Overlap-save algorithm.svg|thumb|500px|Fig 1: A sequence of 4 plots depicts one cycle of the overlap–save convolution algorithm. The 1st plot is a long sequence of data to be processed with a lowpass FIR filter. The 2nd plot is one segment of the data to be processed in piecewise fashion. The 3rd plot is the filtered segment, with the usable portion colored red. The 4th plot shows the filtered segment appended to the output stream.{{efn-ua
|Rabiner and Gold, Fig 2.35, fourth trace
}} The FIR filter is a boxcar lowpass with M=16 samples, the length of the segments is L=100 samples and the overlap is 15 samples.]]
 
{{NumBlk|:|<math>
y[n] = x[n] * h[n]
\ \triangleq\ \sum_{m=-\infty}^{\infty} h[m] \cdot x[n - m]
= \sum_{m=1}^{M} h[m] \cdot x[n - m],
</math>|{{EquationRef|Eq.1}}}}
 
where {{nowrap|''h''[''m''] {{=}} 0}} for ''m'' outside the region {{nowrap|[1, ''M'']}}.
 
The concept is to compute short segments of ''y''[''n''] of an arbitrary length ''L'', and concatenate the segments together. Consider a segment that begins at ''n'' = ''kL''&nbsp;+&nbsp;''M'', for any integer ''k'', and define''':'''
 
:<math>x_k[n] \ \triangleq
\begin{cases}
x[n+kL], & 1 \le n \le L+M-1\\
0, & \textrm{otherwise}.
\end{cases}
</math>
 
:<math>y_k[n] \ \triangleq \ x_k[n]*h[n] = \sum_{m=1}^{M} h[m] \cdot x_k[n-m].</math>
 
Then, for ''kL''&nbsp;+&nbsp;''M'' &nbsp;≤&nbsp; ''n'' &nbsp;≤&nbsp; ''kL''&nbsp;+&nbsp;''L''&nbsp;+&nbsp;''M''&nbsp;−&nbsp;1, and equivalently '''''M''''' &nbsp;≤&nbsp; ''n''&nbsp;−&nbsp;''kL'' &nbsp;≤&nbsp; '''''L''&nbsp;+&nbsp;''M''&nbsp;−&nbsp;1''', we can write''':'''
Line 23 ⟶ 29:
:<math>y[n] = \sum_{m=1}^{M} h[m] \cdot x_k[n-kL-m] \ \ \triangleq \ \ y_k[n-kL].</math>
 
With the substitution &nbsp;{{math|j ≜ n-kL}},&nbsp; the task is reduced to computing {{math|y{{sub|k}}(j)}}, for '''''M''''' &nbsp;≤&nbsp; {{mvar|j}} &nbsp;≤&nbsp; '''''L''&nbsp;+''&nbsp;M''&nbsp;−&nbsp;1'''. TheThese processsteps described above isare illustrated in the accompanyingfirst 3 traces of Figure figure1.
 
Also note that ifIf we periodically extend ''x''<sub>''k''</sub>[''n''] with period ''N'' &nbsp;≥&nbsp; ''L''&nbsp;+&nbsp;''M''&nbsp;−&nbsp;1, according to''':'''
 
:<math>x_{k,N}[n] \ \triangleq \ \sum_{\ell=-\infty}^{\infty} x_k[n - \ell N],</math>
 
the convolutions &nbsp;<math>(x_{k,N})*h\,</math>&nbsp; and &nbsp;<math>x_k*h\,</math>&nbsp; are equivalent in the region ''M'' &nbsp;≤&nbsp; ''n'' &nbsp;≤&nbsp; ''L''&nbsp;+&nbsp;''M''&nbsp;−&nbsp;1. &nbsp; It is therefore sufficient to compute the '''N'''-point [[circular convolution|circular (or cyclic) convolution]] of <math>x_k[n]\,</math> with <math>h[n]\,</math>&nbsp; in the region [1,&nbsp;''N'']. &nbsp;The subregion [''M'',&nbsp;''L''&nbsp;+&nbsp;''M''&nbsp;−&nbsp;1] is appended to the output stream, and the other values are <u>discarded</u>.&nbsp; The advantage is that the circular convolution can be computed more efficiently than linear convolution, according to the [[Discrete Fourier transform#Circular convolution theorem and cross-correlation theorem|circular convolution theorem]]''':'''
 
{{NumBlk|:|<math>y_k[n]\ =\ \scriptstyle \text{DFT}^{-1IDFT}_N \displaystyle (\ \scriptstyle \text{DFT}_N \displaystyle (x_k[n])\cdot\ \scriptstyle \text{DFT}_N \displaystyle (h[n])\ ),</math>|{{EquationRef|Eq.2}}}}
 
where''':'''
*DFT<sub>N</sub> and DFTIDFT<supsub>−1N</supsub> refer to the [[Discrete Fourier transform]] and its inverse, evaluated over ''N'' discrete points, and
*{{math|L}} is customarily chosen such that {{math|N {{=}} L+M-1}} is an integer power-of-2, and the transforms are implemented with the [[Fast Fourier transform|FFT]] algorithm, for efficiency.
*As shown in figureFigure 1 (3rd trace), the leading and trailing edge-effects of convolution are overlapped and added{{efn-ua
|Not to be confused with the [[Overlap-add method]], which preserves separate leading and trailing edge-effects.
}} (and subsequently discarded). In other words, the first output value is a weighted average of the <u>last</u> M-1 samples of the input segment (and the first sample of the segment). The next M-2 outputs are weighted averages of both the beginning and the end of the segment. The M<sup>th</sup> output value is the first one that combines only samples from the beginning of the segment.
**The overlap region can also be shifted to the last M-1 outputs, which is a potential run-time convenience, because the IDFT can be computed in the <math>y[n]</math> buffer, instead of being computed and copied. Then the overlap can be overwritten by the next IDFT. The shift is accomplished by replacing <math>\scriptstyle \text{DFT}_N \displaystyle (h[n])</math> with <math>\scriptstyle \text{DFT}_N \displaystyle (h[n+M-1]) =\ \scriptstyle \text{DFT}_N \displaystyle (h[n+M-1-N]),</math> meaning that the N-length buffer is ''circularly-shifted'' (rotated) by M-1 samples.
 
==Pseudocode==
Line 59 ⟶ 66:
[[Image:FFT_size_vs_filter_length_for_Overlap-add_convolution.svg|thumb|400px|Fig 2: A graph of the values of N (an integer power of 2) that minimize the cost function <math>\tfrac{N\left(\log_2 N + 1\right)}{N - M + 1}</math>]]
 
When the DFT and itsIDFT inverse isare implemented by the FFT algorithm, the pseudocode above requires about {{nowrap|'''N (log<sub>2</sub>(N) + 1)'''}} complex multiplications for the FFT, product of arrays, and IFFT.{{efn-ua
|1=Cooley–Tukey FFT algorithm for N=2<sup>k</sup> needs (N/2) log<sub>2</sub>(N) – see [[Fast Fourier transform#Definition and speed|FFT – Definition and speed]]
}} Each iteration produces {{nowrap|'''N-M+1'''}} output samples, so the number of complex multiplications per output sample is about''':'''