Multidimensional DSP with GPU acceleration: Difference between revisions

Content deleted Content added
SwisterTwister (talk | contribs)
Cleaning up accepted Articles for creation submission (AFCH 0.9)
Sing0512 (talk | contribs)
Line 67:
\end{pmatrix},\quad C_{ij}=\sum_{k=1}^m A_{ik}B_{kj}</math>
 
To compute each element in {{math|'''C'''}} takes {{math|''m''}} multiplications and {{math|(''m'' - ''1'')}} additions. Therefore, with a CPU implementation, the time complexity to achieve this computation is ''Θ(n''<sup href="Category:GPGPU">''3''</sup>'')'' in the following C example''.'' However, we have known that elements in {{math|'''C'''}} are independent to each othersother. Hence, the computation can be fully parallelized by SIMD processors, such as GPGPU devices. With a GPGPU implementation, the time complexity significantly reduces to ''Θ(n)'' by unrolling the for-loop showing in the following OpenCL example''.''<source lang="c" line="1">
// MxM matrix multiplication in C
void matrixMul(