Overlap–save method: Difference between revisions

Content deleted Content added
same information, clarified
add a figure and more discussion of efficiency
Line 5:
where h[m]=0 for m outside the region [1, ''M''].
 
[[Image:Overlap-save algorithm.svg|thumb|500px|Fig 1: A sequence of 4 plots depicts one cycle of the overlap–save convolution algorithm. The 1st plot is a long sequence of data to be processed with a lowpass FIR filter. The 2nd plot is one segment of the data to be processed in piecewise fashion. The 3rd plot is the filtered segment, with the usable portion colored red. The 4th plot shows the filtered segment appended to the output stream.{{efn-ua
|Rabiner and Gold, Fig 2.35, fourth trace
}} The FIR filter is a boxcar lowpass with M=16 samples, the length of the segments is L=100 samples and the overlap is 15 samples.]]
Line 31:
the convolutions &nbsp;<math>(x_{k,N})*h\,</math>&nbsp; and &nbsp;<math>x_k*h\,</math>&nbsp; are equivalent in the region ''M'' &nbsp;≤&nbsp; ''n'' &nbsp;≤&nbsp; ''L''&nbsp;+&nbsp;''M''&nbsp;−&nbsp;1. It is therefore sufficient to compute the '''N'''-point [[circular convolution|circular (or cyclic) convolution]] of <math>x_k[n]\,</math> with <math>h[n]\,</math>&nbsp; in the region [1,&nbsp;''N'']. &nbsp;The subregion [''M'',&nbsp;''L''&nbsp;+&nbsp;''M''&nbsp;−&nbsp;1] is appended to the output stream, and the other values are <u>discarded</u>.&nbsp; The advantage is that the circular convolution can be computed more efficiently than linear convolution, according to the [[Discrete Fourier transform#Circular convolution theorem and cross-correlation theorem|circular convolution theorem]]''':'''
 
{{NumBlk|:|<math>y_k[n] = \scriptstyle \text{DFT}^{-1} \displaystyle (\ \scriptstyle \text{DFT} \displaystyle (x_k[n])\cdot \scriptstyle \text{DFT} \displaystyle (h[n])\ ),</math>|{{EquationRef|Eq.2}}}}
 
where''':'''
*DFT and DFT<sup>−1</sup> refer to the [[Discrete Fourier transform]] and its inverse, evaluated over ''N'' discrete points, and
*{{math|L}} is customarily chosen such that {{math|N {{=}} L+M-1}} is an integer power-of-2, and the transforms are implemented with the [[Fast Fourier transform|FFT]] algorithm, for efficiency.
**As shown in the figure 1 (3rd trace), the leading and trailing edge-effects of convolution are overlapped and added{{efn-ua
**Optimal N is in the range [4M, 8M].
**As shown in the figure (3rd trace), the leading and trailing edge-effects of convolution are overlapped and added{{efn-ua
|Not to be confused with the [[Overlap-add method]], which preserves separate leading and trailing edge-effects.
}} (and subsequently discarded). In other words, the first output value is a weighted average of the <u>last</u> M-1 samples of the input segment (and the first sample of the segment). The next M-2 outputs are weighted averages of both the beginning and the end of the segment. The M<sup>th</sup> output value is the first one that combines only samples from the beginning of the segment.
Line 46 ⟶ 45:
M = length(h)
overlap = M − 1
N = 48 × overlap <span style="color:green;">(orsee next section for a nearbybetter power-of-2choice)</span>
step_size = N − overlap
H = DFT(h, N)
Line 57 ⟶ 56:
'''end'''
 
==Efficiency considerations==
[[Image:FFT_size_vs_filter_length_for_Overlap-add_convolution.svg|thumb|400px|Fig 2: A graph of the values of N (an integer power of 2) that minimize the cost function <math>\tfrac{N\left(\log_2 N + 1\right)}{N - M + 1}</math>]]
When the DFT and its inverse is implemented by the FFT algorithm, the pseudocode above requires about '''N log<sub>2</sub>(N) + N''' complex multiplications for the FFT, product of arrays, and IFFT.{{efn-ua
 
When the DFT and its inverse is implemented by the FFT algorithm, the pseudocode above requires about {{nowrap|'''N (log<sub>2</sub>(N) + N1)'''}} complex multiplications for the FFT, product of arrays, and IFFT.{{efn-ua
|1=Cooley–Tukey FFT algorithm for N=2<sup>k</sup> needs (N/2) log<sub>2</sub>(N) – see [[Fast Fourier transform#Definition and speed|FFT – Definition and speed]]
}} Each iteration produces {{nowrap|'''N-M+1'''}} output samples, so the number of complex multiplications per output sample is about''':'''
 
{{NumBlk|:|<math>\frac{N (\log_2(N) + 1)}{N-M+1}.\,</math>|{{EquationRef|Eq.3}}}}
 
For example, when '''M'''=201 and '''N'''=1024, {{EquationNote|Eq.23}} equals 13.67, whereas direct evaluation of {{EquationNote|Eq.1}} would require up to 201 complex multiplications per output sample, the worst case being when both '''x''' and '''h''' are complex-valued. Also note that for any given '''M''', {{EquationNote|Eq.23}} has a minimum with respect to '''N'''. ItFigure diverges2 is a graph of the values of N that minimize {{EquationNote|Eq.3}} for botha smallrange andof largefilter blocklengths sizes(M).
 
Instead of {{EquationNote|Eq.1}}, we can also consider applying {{EquationNote|Eq.2}} to a long sequence of length <math>N_x</math> samples. The total number of complex multiplications would be:
 
:<math>N_x\cdot (\log_2(N_x) + 1).</math>
 
Comparatively, the number of complex multiplications required by the pseudocode algorithm is:
 
{{NumBlk|:|<math>N_x\frac{Ncdot (\log_2(N) + 1)\cdot \frac{N}{N-M+1}.\,</math>|{{EquationRef|Eq.2}}}}
 
Hence the ''cost'' of the overlap–save method scales almost as <math>O\left(N_x\log_2 N\right)</math> while the cost of a single, large circular convolution is almost <math>O\left(N_x\log_2 N_x \right)</math>.
For example, when '''M'''=201 and '''N'''=1024, {{EquationNote|Eq.2}} equals 13.67, whereas direct evaluation of {{EquationNote|Eq.1}} would require up to 201 complex multiplications per output sample, the worst case being when both '''x''' and '''h''' are complex-valued. Also note that for any given '''M''', {{EquationNote|Eq.2}} has a minimum with respect to '''N'''. It diverges for both small and large block sizes.
 
==Overlap–discard==