Revision as of 03:27, 28 March 2024 edit Dratman (talk \| contribs) Extended confirmed users 5,035 edits Removed the "like an advertisement" template since I know this subject and there is no impression of advertising now. Tag: Visual edit ← Previous edit		Revision as of 17:00, 13 May 2024 edit undo WOSlinker (talk \| contribs) Administrators 861,415 edits m remove href oddities Next edit →
Line 70: \end{pmatrix},\quad C_{ij}=\sum_{k=1}^m A_{ik}B_{kj}</math> To compute each element in {{math\|'''C'''}} takes {{math\|''m''}} multiplications and {{math\|(''m'' – ''1'')}} additions. Therefore, with a CPU implementation, the time complexity to achieve this computation is ''Θ(n''<sup ~~href="Category:GPGPU"~~>''3''</sup>'')'' in the following C example''.'' However, we have known that elements in {{math\|'''C'''}} are independent of each other. Hence, the computation can be fully parallelized by SIMD processors, such as GPGPU devices. With a GPGPU implementation, the time complexity significantly reduces to ''Θ(n)'' by unrolling the for-loop as shown in the following OpenCL example''.''<syntaxhighlight lang="c" line="1"> // MxM matrix multiplication in C void matrixMul( Line 115: ===Multidimensional convolution=== Convolution is a frequently used operation in DSP. To compute the 2-D convolution of two ''m'' × ''m'' signals, it requires {{math\|''m''<sup>''2''</sup>}} multiplications and {{math\|''m'' × (''m'' – ''1'')}} additions for an output element. That is, the overall time complexity is ''Θ(n''<sup ~~href="Category:GPGPU"~~>''4''</sup>'')'' for the entire output signal''.'' As the following OpenCL example shows, with GPGPU acceleration, the total computation time effectively decreases to ''Θ(n''<sup ~~href="Category:GPGPU"~~>''2''</sup>'')'' since all output elements are data independent. 2-D convolution equation: Line 145: </syntaxhighlight> Note that, although the example demonstrated above is a 2-D convolution, a similar approach can be adopted for a higher dimension system. Overall, for a s-D convolution, a GPGPU implementation has time complexity ''Θ(n''<sup ~~href="Category:GPGPU"~~>''s''</sup>'')'', whereas a CPU implementation has time complexity ''Θ(n''<sup ~~href="Category:GPGPU"~~>''2s''</sup>'')''. M-D convolution equation: Line 156: <math>X(\Omega_1,\Omega_2,...,\Omega_s)=\sum_{n_1=0}^{m_1-1}\sum_{n_2=0}^{m_2-1}...\sum_{n_s=0}^{m_s-1}x(n_1, n_2,...,n_s)e^{-j(\Omega_1n_1+\Omega_1n_1+...+\Omega_sn_s)}</math> Practically, to implement an M-D DTFT, we can perform M times 1-D DFTF and matrix transpose with respect to each dimension. With a 1-D DTFT operation, GPGPU can conceptually reduce the complexity from ''Θ(n''<sup ~~href="Category:GPGPU"~~>''2''</sup>'')'' to Θ(n'')'' as illustrated by the following example of OpenCL implementation''.'' That is, an M-D DTFT the complexity of GPGPU can be computed on a GPU with a complexity of ''Θ(n''<sup ~~href="Category:GPGPU"~~>''2''</sup>'').'' While some GPGPUs are also equipped with hardware FFT accelerators internally, this implementation might be also optimized by invoking the FFT APIs or libraries provided by GPU manufacture.<ref>{{cite web\|title=OpenCL™ Optimization Case Study Fast Fourier Transform – Part II – AMD\|url=http://developer.amd.com/resources/documentation-articles/articles-whitepapers/opencl-optimization-case-study-fast-fourier-transform-part-ii/\|website=AMD\|access-date=2015-11-05\|language=en-US}}</ref><syntaxhighlight lang="c++" line="1"> // DTFT in OpenCL __kernel void convolution(

Multidimensional DSP with GPU acceleration: Difference between revisions