Revision as of 23:52, 3 November 2015 edit Sing0512 (talk \| contribs) 143 edits →Adopting Supercomputers Tag: Visual edit ← Previous edit		Revision as of 00:06, 4 November 2015 edit undo Sing0512 (talk \| contribs) 143 edits →Matrix Multiplication Tag: Visual edit Next edit →
Line 63: \end{pmatrix},\quad C_{ij}=\sum_{k=1}^m A_{ik}B_{kj}</math> To compute each element in {{math\|'''C'''}} takes {{math\|''m''}} multiplications and {{math\|(''m'' - ''1'')}} additions. Therefore, with a CPU implementation, the time complexity to achieve this computation is ''Θ(n''<sup href="Category:GPGPU">''3''</sup>'')'' in the following C example''.'' However, we have known that elements in {{math\|'''C'''}} are independent to each others. Hence, the computation can be fully parallelized by SIMD processors, such as GPGPU devices. With a GPGPU implementation, the time complexity reduces to ''Θ(n)'' in the following OpenCL example''.''<source lang="c++" line="1"> __kernel void mul( __global float A, // input matrix A __global float B, // input matrix B __global float C, // output matrix C __global int size) // size of the matrices { size_t id = get_global_id(0); // each thread works on an element size_t row = id / size; size_t col = id % size; float sum = 0.0; for (int m = 0; m < size; m++) { sum += (A[row size + m] * B[m * size + col]); } C[id] = sum; } </source> === Multidimensional Convolution (M-D Convolution) ===

Multidimensional DSP with GPU acceleration: Difference between revisions