Content deleted Content added
Eatingbugs (talk | contribs) mNo edit summary |
Citation bot (talk | contribs) Added bibcode. Removed URL that duplicated identifier. Removed parameters. | Use this bot. Report bugs. | Suggested by Headbomb | Linked from Wikipedia:WikiProject_Academic_Journals/Journals_cited_by_Wikipedia/Sandbox | #UCB_webform_linked 825/990 |
||
(7 intermediate revisions by 6 users not shown) | |||
Line 1:
{{Orphan|date=November 2015}}
Multidimensional Digital Signal Processing (MDSP) refers to the extension of [[Digital signal processing]] (DSP) techniques to signals that vary in more than one dimension. While conventional DSP typically deals with one-dimensional data, such as time-varying [[Audio signal|audio signals]], MDSP involves processing signals in two or more dimensions. Many of the principles from one-dimensional DSP, such as [[Fourier transform|Fourier transforms]] and [[filter design]], have analogous counterparts in multidimensional signal processing.
Modern [[general-purpose computing on graphics processing units]] (GPGPUs) have an excellent throughput on vector operations and numeric manipulations through a high degree of parallel computations. Processing digital signals, particularly multidimensional signals, often involves a series of vector operations on massive numbers of independent data samples, GPGPUs are now widely employed to accelerate multidimensional DSP, such as [[image processing]], [[Video processing|video codecs]], [[Radar signal characteristics|radar signal analysis]], [[sonar signal processing]], and [[ultrasound scan]]ning. Conceptually, GPGPUs dramatically reduce the computation complexity when compared with [[
▲Modern [[general-purpose computing on graphics processing units]] (GPGPUs) have an excellent throughput on vector operations and numeric manipulations through a high degree of parallel computations. Processing digital signals, particularly multidimensional signals, often involves a series of vector operations on massive numbers of independent data samples, GPGPUs are now widely employed to accelerate multidimensional DSP, such as [[image processing]], [[Video processing|video codecs]], [[Radar signal characteristics|radar signal analysis]], [[sonar signal processing]], and [[ultrasound scan]]ning. Conceptually, GPGPUs dramatically reduce the computation complexity when compared with [[Cpu|central processing units (CPUs)]], [[Digital signal processor]]s (DSPs), or other [[Field-programmable gate array|FPGA]] accelerators.
==Motivation==
Line 25 ⟶ 23:
==GPGPU computations==
[[File:SIMD GPGPU.jpg|alt= Figure illustrating a SIMD/vector computation unit in GPGPUs..|thumb|GPGPU/SIMD computation model
Modern GPU designs are mainly based on the [[Single instruction, multiple data|SIMD]] (Single Instruction Multiple Data) computation paradigm.<ref>{{cite journal|title=NVIDIA Tesla: A Unified Graphics and Computing Architecture|journal=IEEE Micro|date=2008-03-01|issn=0272-1732|pages=39–55|volume=28|issue=2|doi=10.1109/MM.2008.31|first1=E.|last1=Lindholm|first2=J.|last2=Nickolls|first3=S.|last3=Oberman|first4=J.|last4=Montrym|
GPGPUs are able to perform an operation on multiple independent data concurrently with their vector or SIMD functional units. A modern GPGPU can spawn thousands of concurrent threads and process all threads in a batch manner. With this nature, GPGPUs can be employed as DSP accelerators easily while many DSP problems can be solved by [[Divide and conquer algorithms|divide-and-conquer]] algorithms. A large scale and complex DSP problem can be divided into a bunch of small numeric problems and be processed altogether at one time so that the overall time complexity can be reduced significantly. For example, multiplying two {{math|''M'' × ''M''}} matrices can be processed by {{math|''M'' × ''M''}} concurrent threads on a GPGPU device without any output data dependency. Therefore, theoretically, by means of GPGPU acceleration, we can gain up to {{math|''M'' × ''M''}} speedup compared with a traditional CPU or digital signal processor.
Line 40 ⟶ 38:
[[OpenCL]] is an industrial standard which was originally proposed by [[Apple Inc.]] and is maintained and developed by the [[Khronos Group]] now.<ref>{{cite web|title=OpenCL – The open standard for parallel programming of heterogeneous systems|url=https://www.khronos.org/opencl/|website=www.khronos.org|date=21 July 2013|access-date=2015-11-05|language=en}}</ref> OpenCL provides [[C++]] like [[Application programming interface|APIs]] for programming different devices universally, including GPGPUs.
[[File:OpenCL execution flow rev.jpg|alt=Illustrating the execution flow of an OpenCL program/kernel|thumb|474x474px|OpenCL program execution flow]]
The following figure illustrates the execution flow of launching an OpenCL program on a GPU device. The CPU first detects OpenCL devices (GPU in this case) and
===C++ AMP===
Line 72 ⟶ 70:
\end{pmatrix},\quad C_{ij}=\sum_{k=1}^m A_{ik}B_{kj}</math>
To compute each element in {{math|'''C'''}} takes {{math|''m''}} multiplications and {{math|(''m'' – ''1'')}} additions. Therefore, with a CPU implementation, the time complexity to achieve this computation is ''Θ(n''<sup
// MxM matrix multiplication in C
void matrixMul(
Line 117 ⟶ 115:
===Multidimensional convolution===
Convolution is a frequently used operation in DSP. To compute the 2-D convolution of two ''m'' × ''m'' signals, it requires {{math|''m''<sup>''2''</sup>}} multiplications and {{math|''m'' × (''m'' – ''1'')}} additions for an output element. That is, the overall time complexity is ''Θ(n''<sup
2-D convolution equation:
Line 147 ⟶ 145:
</syntaxhighlight>
Note that, although the example demonstrated above is a 2-D convolution, a similar approach can be adopted for a higher dimension system. Overall, for a s-D convolution, a GPGPU implementation has time complexity ''Θ(n''<sup
M-D convolution equation:
Line 158 ⟶ 156:
<math>X(\Omega_1,\Omega_2,...,\Omega_s)=\sum_{n_1=0}^{m_1-1}\sum_{n_2=0}^{m_2-1}...\sum_{n_s=0}^{m_s-1}x(n_1, n_2,...,n_s)e^{-j(\Omega_1n_1+\Omega_1n_1+...+\Omega_sn_s)}</math>
Practically, to implement an M-D DTFT, we can perform M times 1-D DFTF and matrix transpose with respect to each dimension. With a 1-D DTFT operation, GPGPU can conceptually reduce the complexity from ''Θ(n''<sup
// DTFT in OpenCL
__kernel void convolution(
Line 184 ⟶ 182:
===Radar signal reconstruction and analysis===
Radar systems usually need to reconstruct numerous 3-D or 4-D data samples in real-time. Traditionally, particularly in military, this needs supercomputers' support. Nowadays, GPGPUs are also employed to replace supercomputers to process radar signals. For example, to process [[Synthetic aperture radar|synthetic aperture radar (SAR)]] signals, it usually involves multidimensional [[Fast Fourier transform|FFT]] computations.<ref>{{cite book|date=2009-10-01|pages=309–314|doi=10.1109/SIPS.2009.5336272|first1=C.|last1=Clemente|first2=M.|last2=Di Bisceglie|first3=M.|last3=Di Santo|first4=N.|last4=Ranaldo|first5=M.|last5=Spinelli|title=2009 IEEE Workshop on Signal Processing Systems|chapter=Processing of synthetic Aperture Radar data with GPGPU|isbn=978-1-4244-4335-2|s2cid=18932083
===Self-driving cars===
|