Rader's FFT algorithm: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 21:20, 6 June 2003 edit Stevenj (talk \| contribs) Extended confirmed users 14,849 edits noted complexity of recursive application, relationship to Cunningham chains ← Previous edit		Latest revision as of 21:35, 10 December 2024 edit undo 209.6.125.39 (talk) →Evaluating the convolution: remove dubious claims (see Talk), which were unsourced anyway
(98 intermediate revisions by 32 users not shown)
Line 1: {{Short description\|Discrete Fourier transform for prime sizes}} '''Rader's FFT algorithm''' is a [[Fast Fourier Transform]] (FFT) algorithm that computes the [[discrete Fourier transform]] (DFT) of [[prime]] sizes by re-expressing the DFT as a cyclic [[convolution]]. '''Rader's algorithm''' (1968),<ref>C. M. Rader, "Discrete Fourier transforms when the number of data samples is prime," ''Proc. IEEE'' 56, 1107–1108 (1968).</ref> named for Charles M. Rader of [[MIT Lincoln Laboratory]], is a [[fast Fourier transform]] (FFT) algorithm that computes the [[discrete Fourier transform]] (DFT) of [[prime number\|prime]] sizes by re-expressing the DFT as a cyclic [[convolution]] (the other algorithm for FFTs of prime sizes, [[Bluestein's FFT algorithm\|Bluestein's algorithm]], also works by rewriting the DFT as a convolution). Since Rader's algorithm only depends upon the periodicity of the DFT kernel, it is directly applicable to any other transform (of prime order) with a similar property, such as a [[number-theoretic transform]] or the [[discrete Hartley transform]]. ~~Recall that the DFT is defined by the formula~~ The algorithm can be modified to gain a factor of two savings for the case of DFTs of real data, using a slightly modified re-indexing/permutation to obtain two half-size cyclic convolutions of real data;<ref>S. Chu and C. Burrus, "A prime factor FTT <nowiki>[</nowiki>''sic''<nowiki>]</nowiki> algorithm using distributed arithmetic," '' IEEE Transactions on Acoustics, Speech, and Signal Processing'' '''30''' (2), 217–227 (1982).</ref> an alternative adaptation for DFTs of real data uses the [[discrete Hartley transform]].<ref name=Frigo05>Matteo Frigo and [[Steven G. Johnson]], "[http://fftw.org/fftw-paper-ieee.pdf The Design and Implementation of FFTW3]," ''Proceedings of the IEEE'' '''93''' (2), 216–231 (2005).</ref> :<math> f_j = \frac{1}{n} \sum_{k=0}^{n-1} x_k e^{-\frac{2\pi i}{n} jk }▼ Winograd extended Rader's algorithm to include prime-power DFT sizes <math>p^m</math>,<ref>S. Winograd, "On Computing the Discrete Fourier Transform", ''Proc. National Academy of Sciences USA'', '''73'''(4), 1005–1006 (1976).</ref><ref>S. Winograd, "On Computing the Discrete Fourier Transform", ''Mathematics of Computation'', '''32'''(141), 175–199 (1978).</ref> and today Rader's algorithm is sometimes described as a special case of [[Fast Fourier transform#Other FFT algorithms\|Winograd's FFT algorithm]], also called the ''multiplicative Fourier transform algorithm'' (Tolimieri et al., 1997),<ref>R. Tolimieri, M. An, and C.Lu, ''Algorithms for Discrete Fourier Transform and Convolution'', Springer-Verlag, 2nd ed., 1997.</ref> which applies to an even larger class of sizes. However, for [[composite number\|composite]] sizes such as prime powers, the [[Cooley–Tukey FFT algorithm]] is much simpler and more practical to implement, so Rader's algorithm is typically only used for large-prime [[Base case (recursion)\|base case]]s of Cooley–Tukey's [[Recursion (computer science)\|recursive]] decomposition of the DFT.<ref name=Frigo05/> ==Algorithm== [[File:FFT visual Rader 11.jpg\|thumb\|Visual representation of a [[DFT matrix]] in Rader's FFT algorithm. The array consists of colored clocks representing a DFT matrix of size 11. By permuting rows and columns (except the first of each) according to sequences generated by the powers of the primitive root of 11, the original DFT matrix becomes a [[circulant matrix]]. Multiplying a data sequence with a circulant matrix is equivalent to the [[cyclic convolution]] with the matrix's row vector. This relation is an example of the fact that the [[multiplicative group]] is cyclic: <math>(\mathbb Z/p\mathbb Z)^\times \cong C_{p-1}</math>.]] Begin with the definition of the discrete Fourier transform: ▲:<math> ~~f_j~~X_k = \~~frac{1}~~sum_{n~~} \sum_{k~~=0}^{nN-1} ~~x_k~~x_n e^{-\frac{2\pi i}{nN} jknk } \qquad jk = 0,\dots,nN-1. </math> If ''nN'' is a prime number, then the set of non-zero indices ~~''k''~~<math>n =\in{} \{1,~~...~~\dots,~~''n''~~N-1 ~~modulo ''n''~~\}</math> forms a [[group (mathematics)\|group]] under multiplication [[modular arithmetic\|modulo]] ''N''. One consequence of ~~this~~the [[number theory]] of such groups is that there exists a [[generating set of a group\|generator]] ~~''g''~~ of the group (sometimes called a [[Primitive root modulo n\|primitive root]], anwhich ~~integer~~can ~~such~~be ~~that~~found by exhaustive search or slightly better algorithms<ref>Donald E. Knuth, ''kThe Art of Computer Programming, vol. 2: Seminumerical Algorithms'', =3rd edition, section 4.5.4, p. 391 (Addison–Wesley, 1998).</ref>). This generator is an integer ''g'' such that <~~sup~~math>''n = g^q'' \pmod N</~~sup~~math> for ~~all~~any non-zero index ''kn'' and for ~~some~~a unique ''<math>q'' \in{} \{0,~~...~~\dots,~~''n''~~N-2.\}</math> (forming ~~Similarly~~a [[bijection]] from ''jq'' =to non-zero ''gn''). Similarly, <~~sup~~math>k = g^{-''p''} \pmod N</~~sup~~math> for ~~all~~any non-zero index ''jk'' and for ~~some~~a unique ''<math>p'' \in{} \{0,~~...~~\dots,~~''n''~~N-2\}</math>, where the negative exponent denotes the [[modular multiplicative inverse\|multiplicative inverse]] of ~~''g''~~<~~sup~~math>''g^p'' \mod N</~~sup~~math> ~~modulo ''n''~~. That means that we can rewrite the DFT using these new indices ''p'' and ''q'' as: :<math> ~~f_0~~X_0 = ~~\frac{1}{n}~~ \sum_{kn=0}^{nN-1} ~~x_k~~x_n,</math> :<math> f_X_{g^{-p}} = ~~\frac{~~x_0~~}{n}~~ + ~~\frac{1}{n}~~ \sum_{q=0}^{nN-2} x_{g^q} e^{-\frac{2\pi i}{nN} g^{q-(p-q)} } \qquad p = 0,\dots,nN-2. </math> ~~The final summation, above, is precisely a cyclic~~(Recall ~~convolution of the two sequences~~that ''ax''<sub>''qn''</sub> and ''bX''<sub>''qk''</sub> ofare ~~length~~implicitly periodic in ''nN''-, and also that <math> e^{2\pi i}=1 </math> ([[Euler'~~'q''~~s =identity]]). ~~0,...~~Thus, all indices and exponents are taken modulo ''nN''~~-2)~~ ~~defined~~as required by: the group arithmetic.) The final summation, above, is precisely a cyclic convolution of the two sequences ''a''<sub>''q''</sub> and ''b''<sub>''q''</sub> (of length ''N''–1, because <math>q \in{} \{0,\dots,N-2\}</math>) defined by: :<math>a_q = x_{g^q}</math> :<math>b_q = e^{-\frac{2\pi i}{nN} g^{-q} }.</math> ===Evaluating the convolution=== Since ''nN''-–1 is composite, this convolution can be performed directly via the [[convolution theorem]] and more conventional FFT algorithms. However, ~~this~~that may not be efficient if ''nN''-–1 itself has large prime factors, requiring recursive use of Rader's algorithm. Instead, one can compute a length-(''N''–1) cyclic convolution exactly by zero-padding it ~~into~~to a ~~linear convolution~~length of at least ~~twice the length~~2(''N''–1)–1, say to a [[power of two]], which can then be evaluated in O(''nN'' log ''nN'') time without the recursive application of Rader's algorithm.▼ This algorithm, then, requires O(''nN'') additions plus O(''nN'' log ''nN'') time for the convolution. In practice, the O(''nN'') additions can often be performed ~~in O(1) additions~~ by absorbing the additions into the convolution: if the convolution is performed by a pair of FFTs, then the sum of ''x''<sub>''kn''</sub> is given by the DC (0th) output of the FFT of ''a''<sub>''q''</sub> plus ''x''<sub>0</sub>, and ''x''<sub>0</sub> can be added to all the outputs by adding it to the DC term of the convolution prior to to the inverse FFT. Still, this algorithm requires intrinsically more operations than FFTs of nearby composite sizes, and typically takes 3-–10 times as long in practice.▼ If Rader's algorithm is performed by using FFTs of size ''nN''-–1 to compute the convolution, rather than by zero padding as mentioned above, the efficiency depends strongly upon ''nN'' and the number of times that Rader's algorithm must be applied recursively. The worst case would be if ''nN''-–1 were 2''nN''<sub>2</sub> where ''nN''<sub>2</sub> is prime, with ''nN''<sub>2</sub>–1 = 2''nN''<sub>3</sub> where ''nN''<sub>3</sub> is prime, and so on~~. In this case, the application of Rader's algorithm would actually require O(''n''<sup>2</sup>) time~~. Such ''nN''<sub>j</sub> are called [[Sophie Germain prime~~\|Sophie Germain primes~~]]s, and ~~the~~such a sequence of them is called a [[Cunningham chain]]~~. The length~~ of ~~Cunningham~~the ~~chains,~~first ~~however grows more slowly than log(''n''), so Rader's algorithm applied in this way is not O(''n''<sup>2</sup>), but it is probably worse than O(''n'' log ''n'')~~kind. ~~Fortunately~~However, athe ~~guarantee~~alternative of ~~O(''n'' log ''n'') complexity can be achieved by using~~ zero padding ~~(at~~can ~~least~~always ~~for~~be ~~cases~~employed ~~when~~if ''nN''-–1 has a large prime ~~factors)~~factor.▼ ▲Since ''n''-1 is composite, this convolution can be performed directly via the [[convolution theorem]] and more conventional FFT algorithms. However, this may not be efficient if ''n''-1 itself has large prime factors, requiring recursive use of Rader's algorithm. Instead, one can compute a cyclic convolution exactly by zero-padding it into a linear convolution of at least twice the length, say to a power of two, which can then be evaluated in O(''n'' log ''n'') time without the recursive application of Rader's algorithm. ~~'''~~==References~~:'''~~==▼ ▲This algorithm, then, requires O(''n'') additions plus O(''n'' log ''n'') time for the convolution. In practice, the O(''n'') additions can often be performed in O(1) additions by absorbing the additions into the convolution: if the convolution is performed by a pair of FFTs, then the sum of ''x''<sub>''k''</sub> is given by the DC (0th) output of the FFT of ''a''<sub>''q''</sub>, and ''x''<sub>0</sub> can be added to all the outputs by adding it to the DC term of the convolution prior to to the inverse FFT. Still, this algorithm requires intrinsically more operations than FFTs of nearby composite sizes, and typically takes 3-10 times as long in practice. <references/> ▲If Rader's algorithm is performed by using FFTs of size ''n''-1 to compute the convolution, rather than by zero padding as mentioned above, the efficiency depends strongly upon ''n'' and the number of times that Rader's algorithm must be applied recursively. The worst case would be if ''n''-1 were 2''n''<sub>2</sub> where ''n''<sub>2</sub> is prime, with ''n''<sub>2</sub> = 2''n''<sub>3</sub> where ''n''<sub>3</sub> is prime, and so on. In this case, the application of Rader's algorithm would actually require O(''n''<sup>2</sup>) time. Such ''n''<sub>j</sub> are called [[Sophie Germain prime\|Sophie Germain primes]], and the sequence of them is called a [[Cunningham chain]]. The length of Cunningham chains, however grows more slowly than log(''n''), so Rader's algorithm applied in this way is not O(''n''<sup>2</sup>), but it is probably worse than O(''n'' log ''n''). Fortunately, a guarantee of O(''n'' log ''n'') complexity can be achieved by using zero padding (at least for cases when ''n''-1 has large prime factors). [[Category:FFT algorithms]] ~~----~~ ▲'''References:''' * C. M. Rader, "Discrete Fourier transforms when the number of data samples is prime," ''Proc. IEEE'' '''56''', 1107-1108 (1968).