Content deleted Content added
Citation bot (talk | contribs) Added bibcode. Removed URL that duplicated identifier. Removed parameters. | Use this bot. Report bugs. | Suggested by Headbomb | Linked from Wikipedia:WikiProject_Academic_Journals/Journals_cited_by_Wikipedia/Sandbox | #UCB_webform_linked 825/990 |
|||
(50 intermediate revisions by 25 users not shown) | |||
Line 1:
{{Orphan|date=November 2015}}
Multidimensional Digital Signal Processing (MDSP) refers to the extension of [[Digital signal processing]] (DSP) techniques to signals that vary in more than one dimension. While conventional DSP typically deals with one-dimensional data, such as time-varying [[Audio signal|audio signals]], MDSP involves processing signals in two or more dimensions. Many of the principles from one-dimensional DSP, such as [[Fourier transform|Fourier transforms]] and [[filter design]], have analogous counterparts in multidimensional signal processing.
Modern [[
==Motivation==
▲Modern [[General-purpose computing on graphics processing units|general purpose graphics processing units (GPGPUs)]] are considered having excellent throughput on vector operations and numeric manipulations by high degree of parallel computation. While processing digital signals, particularly multidimensional signals, often involves in a series of vector operations on massive amount of independent data samples, GPGPUs are now widely employed to accelerate multidimensional DSP, such as [[image processing]], [[Video processing|video codec]], [[Radar signal characteristics|radar signal analysis]], [[sonar signal processing]], and [[Ultrasound scan|ultrasound scanning]]. Conceptually, using GPGPU devices to perform multidimensional DSP is able to dramatically reduce the computation complexity compared with [[Cpu|central processing units (CPUs)]], [[Digital signal processor|digital signal processors (DSPs)]], or other [[Field-programmable gate array|FPGA]] accelerators.
Processing multidimensional signals is a common problem in scientific
==Existing
▲Processing multidimensional signals is a common problem in scientific researches and/or engineering computations. Typically, a DSP problem's computation complexity grows exponentially while the number of dimensions increases. Notwithstanding, with a high degree of time and storage complexity, it is extremely difficult to process multidimensional signals in real-time. Although there are many fast algorithms (e.g. [[Fast Fourier transform|FFT]]) have been proposed in 1-D DSP problems, they are still not efficient enough to be adapted in high dimensional DSP problems. Therefore, it is still hard to obtain the computation results with [[Digital signal processor|digital signal processors (DSPs)]]. Hence, a better solution of software algorithm or hardware architecture to accelerate multidimensional DSP computations is strongly required.
Practically, to accelerate multidimensional DSP, some common approaches have been proposed and developed in the past decades.
===
A makeshift to achieve a real-time requirement in multidimensional DSP applications is
===
Digital signal processors are designed specifically to process vector operations. They have been widely used in DSP computations for decades. However, most digital signal processors are only capable of manipulating
===
In order to accelerate multidimensional DSP computations, using dedicated [[
===
[[Graphics processing unit|GPUs]] are originally devised to accelerate image processing and video stream rendering. Moreover, since modern GPUs
==
[[File:SIMD GPGPU.jpg|alt= Figure illustrating a SIMD/vector computation unit in GPGPUs..|thumb|GPGPU/SIMD computation model
Modern GPU designs are mainly based on the [[Single instruction, multiple data|SIMD]] (Single Instruction Multiple Data) computation paradigm.<ref>{{
GPGPUs are able to perform an operation on multiple independent data concurrently with their vector or SIMD functional units. A modern GPGPU can spawn thousands of concurrent threads and process all threads in a batch manner. With this nature, GPGPUs can be employed as DSP accelerators easily while many DSP problems can be solved by [[Divide and conquer algorithms|divide-and-conquer]] algorithms. A large scale and complex DSP problem can be divided into a bunch of small numeric problems and be processed altogether at one time so that the overall time complexity can be reduced significantly. For example, multiplying two {{math|''M'' × ''M''}} matrices can be processed by {{math|''M'' × ''M''}} concurrent threads on a GPGPU device without any output data dependency. Therefore, theoretically, by means of GPGPU acceleration, we can gain up to {{math|''M'' × ''M''}} speedup compared with a traditional CPU or digital signal processor.
==
Currently, there are several existing programming languages or interfaces which support GPGPU programming.
===
[[CUDA]] is the standard interface to program [[Nvidia|NVIDIA]] GPUs. NVIDIA also provides many CUDA libraries to support DSP acceleration on NVIDIA GPU devices.<ref>{{
===
[[OpenCL]] is an industrial standard which was originally proposed by [[Apple Inc.]] and is maintained and developed by the [[Khronos Group]] now.<ref>{{
[[File:OpenCL
The following figure illustrates the execution flow of launching an OpenCL program on a GPU device. The CPU first detects OpenCL devices (GPU in this case) and then invokes a just-in-time compiler to translate the OpenCL source code into target binary. CPU then sends data to GPU to perform computations. When the GPU is processing data, CPU is free to process its own tasks.
===C++
[[C++ AMP
===
[[OpenACC
==
==={{math|''m'' × ''m''}} matrix multiplication===
Suppose {{math|'''A'''}} and {{math|'''B'''}} are two {{math|''m'' × ''m''}} matrices and we would like to compute {{math|1 = '''C''' = '''A''' × '''B'''}}. <math>\mathbf{A}=\begin{pmatrix}
Line 60 ⟶ 61:
\vdots & \vdots & \ddots & \vdots \\
B_{m1} & B_{m2} & \cdots & B_{mm} \\
\end{pmatrix}</math>
<math>\mathbf{C}=\mathbf{A}\times\mathbf{B}=\begin{pmatrix}
Line 67 ⟶ 68:
\vdots & \vdots & \ddots & \vdots \\
C_{m1} & C_{m2} & \cdots & C_{mm} \\
\end{pmatrix},\quad C_{ij}=\sum_{k=1}^m A_{ik}B_{kj}</math>
To compute each element in {{math|'''C'''}} takes {{math|''m''}} multiplications and {{math|(''m''
// MxM matrix multiplication in C
void matrixMul(
Line 81 ⟶ 82:
for (int col = 0; col < size; col++) {
int id = row * size + col;
for (int m = 0; m < size; m++) {
sum += (A[row * size + m] * B[m * size + col]);
}
C[id] = sum;
}
}
}
</
// MxM matrix multiplication in OpenCL
__kernel void matrixMul(
Line 103 ⟶ 104:
size_t col = id % size;
float sum = 0.0;
// N iterations
for (int m = 0; m < size; m++) {
sum += (A[row * size + m] * B[m * size + col]);
}
C[id] = sum;
}
</syntaxhighlight>
===
Convolution is a frequently used operation in DSP. To compute the 2-D convolution of two ''m'' × ''m'' signals, it requires {{math|''m''<sup>''2''</sup>}} multiplications and {{math|''m'' × (''m''
2-D convolution equation:
<math>y(n_1, n_2)=x(n_1,n_2)**h(n_1,n_2)=\sum_{k_1=0}^{m-1}\sum_{k_2=0}^{m-1}x(k_1, k_2)h(n_1-k_1, n_2-k_2)</math><source lang="c++" line="1">▼
▲<math>y(n_1, n_2)=x(n_1,n_2)**h(n_1,n_2)=\sum_{k_1=0}^{m-1}\sum_{k_2=0}^{m-1}x(k_1, k_2)h(n_1-k_1, n_2-k_2)</math><
// 2-D convolution implementation in OpenCL
__kernel void convolution(
Line 130 ⟶ 133:
size_t n2 = id % col;
float sum = 0.0;
// N x N iterations
for (int k1 = 0; k1 < size; k1++) {
Line 137 ⟶ 140:
}
}
C[id] = sum;
}
</syntaxhighlight>
Note that, although the example demonstrated above is a 2-D convolution, a similar approach can be adopted for a higher dimension system. Overall, for a s-D convolution, a GPGPU implementation has time complexity ''Θ(n''<sup
M-D convolution equation:
▲Note that, although the example demonstrated above is a 2-D convolution, a similar approach can be adopted for a higher dimension system. Overall, for a s-D convolution, a GPGPU implementation has time complexity ''Θ(n''<sup href="Category:GPGPU">''s''</sup>'')'' , whereas a CPU implementation has time complexity ''Θ(n''<sup href="Category:GPGPU">''2s''</sup>'')''.
<math>y(n_1,n_2,...,n_s)=x(n_1,n_2,...,n_s)**h(n_1,n_2,...,n_s)=\sum_{k_1=0}^{m_1-1}\sum_{k_2=0}^{m_2-1}...\sum_{k_s=0}^{m_s-1}x(k_1, k_2,...,k_s)h(n_1-k_1,n_2-k_2,...,n_s-k_s)</math>
===
In addition to convolution, the [[Fourier transform|discrete
<math>X(\Omega_1,\Omega_2,...,\Omega_s)=\sum_{n_1=0}^{m_1-1}\sum_{n_2=0}^{m_2-1}...\sum_{n_s=0}^{m_s-1}x(n_1, n_2,...,n_s)e^{-j(\Omega_1n_1+\Omega_1n_1+...+\Omega_sn_s)}</math>
Practically, to implement an M-D DTFT
// DTFT in OpenCL
__kernel void convolution(
Line 163 ⟶ 168:
X_re[id] = 0.0;
X_im[id] = 0.0;
for (int i = 0; i < size; i++) {
X_re += (x_re[id] * cos(2 * 3.1415 * id / size) - x_im[id]
X_im += (x_re[id] * sin(2 * 3.1415 * id / size) + x_im[id]
}
}
</syntaxhighlight>
==Real applications==
Designing a multidimensional digital filter
▲===Digital Filter Design===
▲Designing a digital filter in multidimensional area is a big challenge, especially [[Infinite impulse response|IIR]] filters. Typically it relies on computers to solve difference equations and obtain a set of approximated solutions. While GPGPU computation is becoming popular, several adaptive algorithms have been proposed to design multidimensional [[Finite impulse response|FIR]] and/or [[Infinite impulse response|IIR]] filters by means of GPGPUs<ref>{{Cite journal|title = GPU-efficient Recursive Filtering and Summed-area Tables|url = http://doi.acm.org/10.1145/2024156.2024210|publisher = ACM|journal = Proceedings of the 2011 SIGGRAPH Asia Conference|date = 2011-01-01|___location = New York, NY, USA|isbn = 978-1-4503-0807-6|pages = 176:1–176:12|series = SA '11|doi = 10.1145/2024156.2024210|first = Diego|last = Nehab|first2 = André|last2 = Maximo|first3 = Rodolfo S.|last3 = Lima|first4 = Hugues|last4 = Hoppe}}</ref><ref>{{Cite book|title = GPU Gems 2: Programming Techniques For High-Performance Graphics And General-Purpose Computation|last = Pharr|first = Matt|publisher = Pearson Addison Wesley|year = 2005|isbn = 0321335597|___location = |pages = |last2 = Fernando|first2 = Randima}}</ref><ref>{{Cite book|title = GPU Computing Gems Emerald Edition|last = Hwu|first = Wen-mei W.|publisher = Morgan Kaufmann Publishers Inc.|year = 2011|isbn = 0123859638|___location = San Francisco, CA, USA|pages = }}</ref>.
===Radar
Radar systems usually
===Self-
Many [[self-driving
===Medical
In order to have accurate diagnosis, 2-D or 3-D medical signals, such as [[ultrasound]], [[X-ray]], [[Magnetic resonance imaging|MRI]], and [[CT scan|CT]], often
==References==
Line 192 ⟶ 198:
{{Parallel computing}}
[[
[[
[[
[[
|