Multidimensional DSP with GPU acceleration: Difference between revisions

Content deleted Content added
Sing0512 (talk | contribs)
Sing0512 (talk | contribs)
Line 63:
\end{pmatrix},\quad C_{ij}=\sum_{k=1}^m A_{ik}B_{kj}</math>
 
To compute each element in {{math|'''C'''}} takes {{math|''m''}} multiplications and {{math|(''m'' - ''1'')}} additions. Therefore, with a CPU implementation, the time complexity to achieve this computation is ''Θ(n''<sup href="Category:GPGPU">''3''</sup>'')'' in the following C example''.'' However, we have known that elements in {{math|'''C'''}} are independent to each others. Hence, the computation can be fully parallelized by SIMD processors, such as GPGPU devices. With a GPGPU implementation, the time complexity reduces to ''Θ(n)'' in the following OpenCL example''.''<source lang="c++" line="1">
__kernel void mul(
__global float *A, // input matrix A
__global float *B, // input matrix B
__global float *C, // output matrix C
__global int size) // size of the matrices
{
size_t id = get_global_id(0); // each thread works on an element
size_t row = id / size;
size_t col = id % size;
float sum = 0.0;
for (int m = 0; m < size; m++) {
sum += (A[row * size + m] * B[m * size + col]);
}
C[id] = sum;
}
</source>
 
=== Multidimensional Convolution (M-D Convolution) ===