Multidimensional DSP with GPU acceleration: Difference between revisions

Content deleted Content added
Sing0512 (talk | contribs)
mNo edit summary
Citation bot (talk | contribs)
Added bibcode. Removed URL that duplicated identifier. Removed parameters. | Use this bot. Report bugs. | Suggested by Headbomb | Linked from Wikipedia:WikiProject_Academic_Journals/Journals_cited_by_Wikipedia/Sandbox | #UCB_webform_linked 825/990
 
(48 intermediate revisions by 25 users not shown)
Line 1:
{{Orphan|date=November 2015}}
{{AFC submission|t||ts=20151031094109|u=Sing0512|ns=118}} <!--- Important, do not remove this line before article has been created. --->
Multidimensional Digital Signal Processing (MDSP) refers to the extension of [[Digital signal processing]] (DSP) techniques to signals that vary in more than one dimension. While conventional DSP typically deals with one-dimensional data, such as time-varying [[Audio signal|audio signals]], MDSP involves processing signals in two or more dimensions. Many of the principles from one-dimensional DSP, such as [[Fourier transform|Fourier transforms]] and [[filter design]], have analogous counterparts in multidimensional signal processing.
 
Modern [[Generalgeneral-purpose computing on graphics processing units|general purpose graphics processing units]] (GPGPUs)]] are consideredhave havingan excellent throughput on vector operations and numeric manipulations bythrough a high degree of parallel computationcomputations. While processingProcessing digital signals, particularly multidimensional signals, often involves in a series of vector operations on massive amountnumbers of independent data samples, GPGPUs are now widely employed to accelerate multidimensional DSP, such as [[image processing]], [[Video processing|video codeccodecs]], [[Radar signal characteristics|radar signal analysis]], [[sonar signal processing]], and [[Ultrasoundultrasound scan|ultrasound scanning]]ning. Conceptually, using GPGPU devices to perform multidimensional DSP is able toGPGPUs dramatically reduce the computation complexity when compared with [[Cpu|central processing unitsunit]]s (CPUs)]], [[Digitaldigital signal processor|digital signal processors]]s (DSPs)]], or other [[Field-programmable gate array|FPGA]] accelerators.
[[Digital signal processing|Digital signal processing (DSP)]] is a ubiquitous methodology in scientific and engineering computations. However, practically, to DSP problems are often not only 1-D. For instance, image data are 2-D signals and radar signals are 3-D signals. While the number of dimensions increases, the time and/or storage complexity of processing digital signal grows dramatically. Therefore, solving DSP problems in real-time is extremely difficult in reality.
 
==Motivation==
Modern [[General-purpose computing on graphics processing units|general purpose graphics processing units (GPGPUs)]] are considered having excellent throughput on vector operations and numeric manipulations by high degree of parallel computation. While processing digital signals, particularly multidimensional signals, often involves in a series of vector operations on massive amount of independent data samples, GPGPUs are now widely employed to accelerate multidimensional DSP, such as [[image processing]], [[Video processing|video codec]], [[Radar signal characteristics|radar signal analysis]], [[sonar signal processing]], and [[Ultrasound scan|ultrasound scanning]]. Conceptually, using GPGPU devices to perform multidimensional DSP is able to dramatically reduce the computation complexity compared with [[Cpu|central processing units (CPUs)]], [[Digital signal processor|digital signal processors (DSPs)]], or other [[Field-programmable gate array|FPGA]] accelerators.
Processing multidimensional signals is a common problem in scientific researchesresearch and/or engineering computations. Typically, a DSP problem's computation complexity grows exponentially whilewith the number of dimensions increases. Notwithstanding, with a high degree of time and storage complexity, it is extremely difficult to process multidimensional signals in real-time. Although there are many fast algorithms (e.g. [[Fast Fourier transform|FFT]]) have been proposed infor 1-D DSP problems, they are still not efficient enough to be adapted in high dimensional DSP problems. Therefore, it is still hard to obtain the desired computation results with [[Digital signal processor|digital signal processors (DSPs)]]. Hence, a better solutionalgorithms ofand softwarehardware algorithmarchitecture orare hardware architectureneeded to accelerate multidimensional DSP computations is strongly required.
 
==Existing Motivation approaches==
Processing multidimensional signals is a common problem in scientific researches and/or engineering computations. Typically, a DSP problem's computation complexity grows exponentially while the number of dimensions increases. Notwithstanding, with a high degree of time and storage complexity, it is extremely difficult to process multidimensional signals in real-time. Although there are many fast algorithms (e.g. [[Fast Fourier transform|FFT]]) have been proposed in 1-D DSP problems, they are still not efficient enough to be adapted in high dimensional DSP problems. Therefore, it is still hard to obtain the computation results with [[Digital signal processor|digital signal processors (DSPs)]]. Hence, a better solution of software algorithm or hardware architecture to accelerate multidimensional DSP computations is strongly required.
 
== Existing Approaches ==
Practically, to accelerate multidimensional DSP, some common approaches have been proposed and developed in the past decades.
 
=== Lower Sampling Ratesampling rate===
A makeshift to achieve a real-time requirement in multidimensional DSP applications is usingto use a lower sampling rate, which can efficiently reduce the number of samples to be processed at one time and thereby decreasingdecrease the total processing time. However, this can lead to the aliasing problem indue to the [[Nyquist–Shannon sampling theorem|sampling theorem]] and make a poor -quality of outputs. In some applications, such as military radars and medical images, we are eager to have highly precise and accurate results. In such cases, using a lower sampling rate to reduce the amount of computation in the multidimensional DSP ___domain is not always allowable.
 
=== Digital Signal Processors (DSPs)signal processors===
Digital signal processors are designed specifically to process vector operations. They have been widely used in DSP computations for decades. However, most digital signal processors are only capable of manipulating couplea few operations in parallel. This kind of designsdesign is sufficient to accelerate audio processing (1-D signals) and image processing (2-D signals). However, with a large amountnumber of data samples inof multidimensional signals, this is still not powerful enough to retrieve computation results in real-time.
 
=== Supercomputer Assistance assistance===
In order to accelerate multidimensional DSP computations, using dedicated [[Supercomputer|supercomputerssupercomputer]]s or [[Computer cluster|cluster computers]] is required in some circumstances, e.g., [[weather forecasting]] and military radars. Nevertheless, using supercomputers designated to simply perform DSP operations takes considerable money cost and energy consumption. ItAlso, it is not practical and suitable for all multidimensional DSP applications.
 
=== GPU Acceleration acceleration===
[[Graphics processing unit|GPUs]] are originally devised to accelerate image processing and video stream rendering. Moreover, since modern GPUs' have good ability to perform numeric computations in parallel with a relatively low cost and better energy efficiency, GPUs are becoming a popular alternative to replace supercomputers performing multidimensional DSP.<ref>{{Citecite journalbook|title date= OpenCL: Make Ubiquitous Supercomputing Possible2010-09-01|url pages= http:556–561|doi=10.1109//ieeexploreHPCC.ieee2010.org/xpls/abs_all.jsp?arnumber56|first1=5581488&tagSlo-Li|last1=1Chu|journal first2=Chih-Chieh|last2=Hsiao|title= 2010 12th IEEE 12th International Conference on High Performance Computing and Communications (HPCC)|date chapter=OpenCL: 2010-09-01|pagesMake =Ubiquitous 556-561|doiSupercomputing = 10.1109/HPCC.2010.56Possible|first isbn= Slo978-Li1-4244-8335-8|last s2cid= Chu14586211|first2 language= Chih-Chieh|last2 = Hsiaoen}}</ref>.
 
== GPGPU Computations computations==
[[File:SIMD GPGPU.jpg|alt= Figure illustrating a SIMD/vector computation unit in GPGPUs..|thumb|GPGPU/SIMD computation model.]]
 
Modern GPU designs are mainly based on the [[Single instruction, multiple data|SIMD]] (Single Instruction Multiple Data) computation paradigm.<ref>{{Citecite journal|title = NVIDIA Tesla: A Unified Graphics and Computing Architecture|url = http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=4523358&url=http%253A%252F%252Fieeexplore.ieee.org%252Fxpls%252Fabs_all.jsp%253Farnumber%253D4523358|journal = IEEE Micro|date = 2008-03-01|issn = 0272-1732|pages = 39-5539–55|volume = 28|issue = 2|doi = 10.1109/MM.2008.31|first first1= E.|last last1= Lindholm|first2 = J.|last2 = Nickolls|first3 = S.|last3 = Oberman|first4 = J.|last4 = Montrym|bibcode=2008IMicr..28b..39L |s2cid=2793450|language=en}}</ref><ref>{{Citecite book|title = Performance Analysis and Tuning for General Purpose Graphics Processing Units (GPGPU)|last last1= Kim|first first1= Hyesoon|author1-link=Hyesoon Kim|publisher = Morgan & Claypool Publishers|year = 2012|isbn = 9781608459544|___location = |pages = 978-1-60845-954-4|last2 = Vuduc|first2 = Richard|last3 = Baghsorkhi|first3 = Sara|last4 = Choi|first4 = Jee|last5 = Hwu|first5 = Wen-Mei W.|editor-last = Hill|editor-first = Mark D.|doi = 10.2200/S00451ED1V01Y201209CAC020|language=en}}</ref>. This type of GPU devices areis so-called [[General-purpose computing on graphics processing units|general-purpose GPUs (GPGPUs)]].
 
GPGPUs are able to perform an operation on multiple independent data concurrently with their vector or SIMD functional units. A modern GPGPU can spawn thousands of concurrent threads and process all threads in a batch manner. With this nature, GPGPUs can be employed as DSP accelerators easily while many DSP problems can be solved by [[Divide and conquer algorithms|divide-and-conquer]] algorithms. A large scale and complex DSP problem can be divided into a bunch of small numeric problems and be processed altogether at one time so that the overall time complexity can be reduced significantly. For example, multiplying two {{math|''M'' × ''M''}} matrices can be processed by {{math|''M'' × ''M''}} concurrent threads on a GPGPU device without any output data dependency. Therefore, theoretically, by means of GPGPU acceleration, we can gain up to {{math|''M'' × ''M''}} speedup compared with a traditional CPU or digital signal processor.
 
== GPU Programming Languagesprogramming languages==
Currently, there are several existing programming languages or interfaces which support GPGPU programming.
 
=== CUDA ===
[[CUDA]] is the standard interface to program [[Nvidia|NVIDIA]] GPUs. NVIDIA also provides many CUDA libraries to support DSP acceleration on NVIDIA GPU devices.<ref>{{Citecite web|title = Parallel Programming and Computing Platform {{!}} CUDA {{!}} NVIDIA {{!}} NVIDIA|url = http://www.nvidia.com/object/cuda_home_new.html|website = www.nvidia.com|publisher access-date=2015-11-05|archive-url= https://plusweb.googlearchive.org/web/20140106051908/http://www.nvidia.com/104889184472622775891object/cuda_home_new.html|accessdate archive-date= 20152014-1101-0506|url-status=dead|language=en}}</ref>.
 
=== OpenCL ===
[[OpenCL]] is an industrial standard which was originally proposed by [[Apple Inc.]] and is maintained and developed by the [[Khronos Group]] now.<ref>{{Citecite web|title = OpenCL - The open standard for parallel programming of heterogeneous systems|url = https://www.khronos.org/opencl/|website = www.khronos.org|accessdate date=21 July 2013|access-date=2015-11-05|language=en}}</ref>. OpenCL provides [[C++]] like [[Application programming interface|APIs]] for programming different devices universally, including GPGPUs.
[[File:OpenCL program execution flow rev.jpg|alt=Illustrating the execution flow of aan OpenCL program/kernel|thumb|474x474px|OpenCL program execution flow|285x285px]]
The following figure illustrates the execution flow of launching an OpenCL program on a GPU device. The CPU first detects OpenCL devices (GPU in this case) and then invokes a just-in-time compiler to translate the OpenCL source code into target binary. CPU then sends data to GPU to perform computations. When the GPU is processing data, CPU is free to process its own tasks.
 
===C++ AmpAMP===
[[C++ AMP|C++ Amp]] is a programming model proposed by [[Microsoft]]. C++ AmpAMP is a [[C++]] based library designed for programming SIMD processors<ref>{{Citecite web|title = C++ AMP (C++ Accelerated Massive Parallelism)|url = https://msdn.microsoft.com/en-us/library/hh265137.aspx|website = msdn.microsoft.com|accessdate access-date= 2015-11-05|language=en}}</ref>.
 
===OpenAccOpenACC===
[[OpenACC|OpenAcc]] is a programming standard for [[parallel computing]] developed by [[Cray]], CAPS, [[Nvidia|NVIDIA]] and PGI.<ref>{{Citecite web|title = OpenACC Home {{!}} www.openacc.org|url = http://www.openacc.org/|website = www.openacc.org|accessdate access-date= 2015-11-05|language=en}}</ref>. OpenAcc targets to programprogramming for CPU and GPU heterogeneous systems with [[C (programming language)|C]], [[C++]], and [[Fortran]] extensions.
 
== Examples of GPU Programmingprogramming for Multidimensionalmultidimensional DSP ==
 
=== {{math|''m'' × ''m''}} Matrix Multiplication ===
==={{math|''m'' × ''m''}} matrix multiplication===
Suppose {{math|'''A'''}} and {{math|'''B'''}} are two {{math|''m'' × ''m''}} matrices and we would like to compute {{math|1 = '''C''' = '''A''' × '''B'''}}.
 
<math>\mathbf{A}=\begin{pmatrix}
Line 60 ⟶ 61:
\vdots & \vdots & \ddots & \vdots \\
B_{m1} & B_{m2} & \cdots & B_{mm} \\
\end{pmatrix}</math>
 
<math>\mathbf{C}=\mathbf{A}\times\mathbf{B}=\begin{pmatrix}
Line 67 ⟶ 68:
\vdots & \vdots & \ddots & \vdots \\
C_{m1} & C_{m2} & \cdots & C_{mm} \\
\end{pmatrix},\quad C_{ij}=\sum_{k=1}^m A_{ik}B_{kj}</math>
 
To compute each element in {{math|'''C'''}} takes {{math|''m''}} multiplications and {{math|(''m'' - ''1'')}} additions. Therefore, with a CPU implementation, the time complexity to achieve this computation is ''Θ(n''<sup href="Category:GPGPU">''3''</sup>'')'' in the following C example''.'' However, we have known that elements in {{math|'''C'''}} are independent toof each othersother. Hence, the computation can be fully parallelized by SIMD processors, such as GPGPU devices. With a GPGPU implementation, the time complexity significantly reduces to ''Θ(n)'' by unrolling the for-loop showingas shown in the following OpenCL example''.''<sourcesyntaxhighlight lang="c" line="1">
// MxM matrix multiplication in C
void matrixMul(
Line 81 ⟶ 82:
for (int col = 0; col < size; col++) {
int id = row * size + col;
flotfloat sum = 0.0;
 
for (int m = 0; m < size; m++) {
sum += (A[row * size + m] * B[m * size + col]);
}
 
C[id] = sum;
}
}
}
</sourcesyntaxhighlight><sourcesyntaxhighlight lang="c++" line="1">
// MxM matrix multiplication in OpenCL
__kernel void matrixMul(
Line 103 ⟶ 104:
size_t col = id % size;
float sum = 0.0;
 
// N iterations
for (int m = 0; m < size; m++) {
sum += (A[row * size + m] * B[m * size + col]);
}
 
C[id] = sum;
}
</syntaxhighlight>
</source>
 
=== Multidimensional Convolution (M-D Convolution) convolution===
Convolution is a frequently used operation in DSP. To compute the 2-D convolution of two ''m'' × ''m'' signals, it requires {{math|''m''<sup>''2''</sup>}} multiplications and {{math|''m'' × (''m'' - ''1'')}} additions for an output element. That is, the overall time complexity is ''Θ(n''<sup href="Category:GPGPU">''4''</sup>'')'' for the entire output signal''.'' As the following OpenCL example shows, with GPGPU acceleration, the total computation time effectively decreases to ''Θ(n''<sup href="Category:GPGPU">''2''</sup>'')'' since all output elements are data independent.
 
2-D convolution equation:
=== Multidimensional Convolution (M-D Convolution) ===
Convolution is a frequently used operation in DSP. To compute 2-D convolution of two ''m'' × ''m'' signals, it requires {{math|''m''<sup>''2''</sup>}} multiplications and {{math|''m'' × (''m'' - ''1'')}} additions for an output element. That is, the overall time complexity is ''Θ(n''<sup href="Category:GPGPU">''4''</sup>'')'' for the entire output signal''.'' As the following OpenCL example shows, with GPGPU acceleration, the total computation time effectively decreases to ''Θ(n''<sup href="Category:GPGPU">''2''</sup>'')'' since all output elements are data independent.
 
<math>y(n_1, n_2)=x(n_1,n_2)**h(n_1,n_2)=\sum_{k_1=0}^{m-1}\sum_{k_2=0}^{m-1}x(k_1, k_2)h(n_1-k_1, n_2-k_2)</math><sourcesyntaxhighlight lang="c++" line="1">
// 2-D convolution implementation in OpenCL
__kernel void convolution(
Line 130 ⟶ 133:
size_t n2 = id % col;
float sum = 0.0;
 
// N x N iterations
for (int k1 = 0; k1 < size; k1++) {
Line 137 ⟶ 140:
}
}
 
C[id] = sum;
}
</syntaxhighlight>
</source>
 
Note that, although the example demonstrated above is a 2-D convolution, a similar approach can be adopted for a higher dimension system. Overall, for a s-D convolution, a GPGPU implementation has time complexity ''Θ(n''<sup href="Category:GPGPU">''s''</sup>'')'' , whereas a CPU implementation has time complexity ''Θ(n''<sup href="Category:GPGPU">''2s''</sup>'')''.
 
M-D convolution equation:
Note that, although the example demonstrated above is a 2-D convolution, a similar approach can be adopted for a higher dimension system. Overall, for a s-D convolution, a GPGPU implementation has time complexity ''Θ(n''<sup href="Category:GPGPU">''s''</sup>'')'' , whereas a CPU implementation has time complexity ''Θ(n''<sup href="Category:GPGPU">''2s''</sup>'')''.
 
<math>y(n_1,n_2,...,n_s)=x(n_1,n_2,...,n_s)**h(n_1,n_2,...,n_s)=\sum_{k_1=0}^{m_1-1}\sum_{k_2=0}^{m_2-1}...\sum_{k_s=0}^{m_s-1}x(k_1, k_2,...,k_s)h(n_1-k_1,n_2-k_2,...,n_s-k_s)</math>
 
=== Multidimensional Discretediscrete Timetime Fourierfourier Transformtransform (M-D DTFT) ===
In addition to convolution, the [[Fourier transform|discrete -time Fourier transform (DTFT)]] is another technique which is often used in system analysis.
 
<math>X(\Omega_1,\Omega_2,...,\Omega_s)=\sum_{n_1=0}^{m_1-1}\sum_{n_2=0}^{m_2-1}...\sum_{n_s=0}^{m_s-1}x(n_1, n_2,...,n_s)e^{-j(\Omega_1n_1+\Omega_1n_1+...+\Omega_sn_s)}</math>
 
Practically, to implement an M-D DTFT, if assuming the system is separable, we can perform M times 1-D DFTF and matrix transpose with respect to each dimension. With a 1-D FFTDTFT operation, GPGPU can conceptually reduce the complexity from ''Θ(n''<sup href="Category:GPGPU">''2''</sup>'')'' to Θ(n'')'' as illustrated by the following example of OpenCL implementation''.'' That is, to performan M-D DTFT, the complexity of GPGPU can be achievedcomputed byon a GPU with a complexity of ''Θ(n''<sup href="Category:GPGPU">''2''</sup>'').'' While some GPGPUs' are also equipped with hardware FFT acceleratoraccelerators internally, this implementation might be also optimized by invoking the FFT APIs or libraries provided by GPU manufacturesmanufacture.<ref>{{Citecite web|title = OpenCL™ Optimization Case Study Fast Fourier Transform - Part II - AMD|url = http://developer.amd.com/resources/documentation-articles/articles-whitepapers/opencl-optimization-case-study-fast-fourier-transform-part-ii/|website = AMD|accessdate access-date= 2015-11-05|language = en-US}}</ref>.<sourcesyntaxhighlight lang="c++" line="1">
// DTFT in OpenCL
__kernel void convolution(
Line 163 ⟶ 168:
X_re[id] = 0.0;
X_im[id] = 0.0;
 
for (int i = 0; i < size; i++) {
X_re += (x_re[id] * cos(2 * 3.1415 * id / size) - x_im[id] +* sin(2 * 3.1415 * id / size));
X_im += (x_re[id] * sin(2 * 3.1415 * id / size) + x_im[id] +* cos(2 * 3.1415 * id / size));
}
}
</syntaxhighlight>
</source>
 
== Real Applications applications==
===Digital Filter Design===
Designing a digital filter in multidimensional area is a big challenge, especially [[Infinite impulse response|IIR]] filters. Typically it relies on computers to solve difference equations and obtain a set of approximated solutions. While GPGPU computation is becoming popular, several adaptive algorithms have been proposed to design multidimensional [[Finite impulse response|FIR]] and/or [[Infinite impulse response|IIR]] filters by means of GPGPUs<ref>{{Cite journal|title = GPU-efficient Recursive Filtering and Summed-area Tables|url = http://doi.acm.org/10.1145/2024156.2024210|publisher = ACM|journal = Proceedings of the 2011 SIGGRAPH Asia Conference|date = 2011-01-01|___location = New York, NY, USA|isbn = 978-1-4503-0807-6|pages = 176:1–176:12|series = SA '11|doi = 10.1145/2024156.2024210|first = Diego|last = Nehab|first2 = André|last2 = Maximo|first3 = Rodolfo S.|last3 = Lima|first4 = Hugues|last4 = Hoppe}}</ref><ref>{{Cite book|title = GPU Gems 2: Programming Techniques For High-Performance Graphics And General-Purpose Computation|last = Pharr|first = Matt|publisher = Pearson Addison Wesley|year = 2005|isbn = 0321335597|___location = |pages = |last2 = Fernando|first2 = Randima}}</ref><ref>{{Cite book|title = GPU Computing Gems Emerald Edition|last = Hwu|first = Wen-mei W.|publisher = Morgan Kaufmann Publishers Inc.|year = 2011|isbn = 0123859638|___location = San Francisco, CA, USA|pages = }}</ref>.
 
===Digital Filterfilter Designdesign===
===Radar Signal Reconstruction and Analysis===
Designing a multidimensional digital filter in multidimensional area is a big challenge, especially [[Infinite impulse response|IIR]] filters. Typically it relies on computers to solve difference equations and obtain a set of approximated solutions. While GPGPU computation is becoming popular, several adaptive algorithms have been proposed to design multidimensional [[Finite impulse response|FIR]] and/or [[Infinite impulse response|IIR]] filters by means of GPGPUs.<ref>{{Citecite journal|title = GPU-efficient Recursive Filtering and Summed-area Tables|url = http://doi.acm.org/10.1145/2024156.2024210book|publisher = ACM|journal = Proceedings of the 2011 SIGGRAPH Asia Conference|date = 2011-01-01|___location = New York, NY, USA|isbn = 978-1-4503-0807-6|pages = 176:1–176:12|series = SA '11|doi = 10.1145/2024156.2024210|first first1= Diego|last last1= Nehab|first2 = André|last2 = Maximo|first3 = Rodolfo S.|last3 = Lima|first4 = Hugues|last4=Hoppe|title=Proceedings of the 2011 SIGGRAPH Asia Conference|chapter=GPU-efficient Hopperecursive filtering and summed-area tables|s2cid=3014398|url=https://dl.acm.org/doi/abs/10.1145/2024156.2024210|url-access=limited|language=en}}</ref><ref>{{Citecite book|title = GPU Gems 2: Programming Techniques For High-Performance Graphics And General-Purpose Computation|last last1= Pharr|first first1= Matt|publisher = Pearson Addison Wesley|year = 2005|isbn = 0321335597|___location = |pages = 978-0-321-33559-3|last2 = Fernando|first2 = Randima|language=en}}</ref><ref>{{Citecite book|title = GPU Computing Gems Emerald Edition|last = Hwu|first = Wen-mei W.|publisher = Morgan Kaufmann Publishers Inc.|year = 2011|isbn = 0123859638978-0-12-385963-1|___location = San Francisco, CA, USA|pages language= en}}</ref>.
Radar systems usually require to reconstruct a numerous amount of 3-D or 4-D data samples in real-time. Traditionally, particularly in military, this needs supercomputers' support. Nowadays, GPGPUs are also employed to replace supercomputers to process radar signals. For example, to process [[Synthetic aperture radar|synthetic aperture radar (SAR)]] signals, it usually involves multidimensional [[Fast Fourier transform|FFT]] computations<ref>{{Cite journal|title = Processing of synthetic Aperture Radar data with GPGPU|url = http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=5336272&newsearch=true&queryText=gpgpu%2520radar|journal = IEEE Workshop on Signal Processing Systems, 2009. SiPS 2009|date = 2009-10-01|pages = 309-314|doi = 10.1109/SIPS.2009.5336272|first = C.|last = Clemente|first2 = M.|last2 = Di Bisceglie|first3 = M.|last3 = Di Santo|first4 = N.|last4 = Ranaldo|first5 = M.|last5 = Spinelli}}</ref><ref>{{Cite journal|title = An Efficient SAR Processor Based on GPU via CUDA|url = http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5304418|journal = 2nd International Congress on Image and Signal Processing, 2009. CISP '09|date = 2009-10-01|pages = 1-5|doi = 10.1109/CISP.2009.5304418|first = Bin|last = Liu|first2 = Kaizhi|last2 = Wang|first3 = Xingzhao|last3 = Liu|first4 = Wenxian|last4 = Yu}}</ref><ref>{{Cite journal|title = Implementing radar algorithms on CUDA hardware|url = http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6872240|journal = Mixed Design of Integrated Circuits Systems (MIXDES), 2014 Proceedings of the 21st International Conference|date = 2014-06-01|pages = 455-458|doi = 10.1109/MIXDES.2014.6872240|first = P.|last = Monsurro|first2 = A.|last2 = Trifiletti|first3 = F.|last3 = Lannutti}}</ref>. GPGPUs can be used to rapidly perform FFT and/or iFFT in this kind of applications.
 
===Radar signal reconstruction and analysis===
===Self-Driving Car===
Radar systems usually requireneed to reconstruct a numerous amount of 3-D or 4-D data samples in real-time. Traditionally, particularly in military, this needs supercomputers' support. Nowadays, GPGPUs are also employed to replace supercomputers to process radar signals. For example, to process [[Synthetic aperture radar|synthetic aperture radar (SAR)]] signals, it usually involves multidimensional [[Fast Fourier transform|FFT]] computations.<ref>{{Citecite journal|title = Processing of synthetic Aperture Radar data with GPGPU|url = http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=5336272&newsearch=true&queryText=gpgpu%2520radar|journal = IEEE Workshop on Signal Processing Systems, 2009. SiPS 2009book|date = 2009-10-01|pages = 309-314309–314|doi = 10.1109/SIPS.2009.5336272|first first1= C.|last last1= Clemente|first2 = M.|last2 = Di Bisceglie|first3 = M.|last3 = Di Santo|first4 = N.|last4 = Ranaldo|first5 = M.|last5 = Spinelli}}</ref><ref>{{Cite journal|title =2009 AnIEEE Efficient SAR Processor BasedWorkshop on GPUSignal viaProcessing CUDASystems|url chapter=Processing http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5304418|journalof =synthetic 2ndAperture InternationalRadar Congressdata onwith ImageGPGPU|isbn=978-1-4244-4335-2|s2cid=18932083|language=en}}</ref><ref>{{cite and Signal Processing, 2009. CISP '09book|date = 2009-10-01|pages = 1-51–5|doi = 10.1109/CISP.2009.5304418|first first1= Bin|last last1= Liu|first2 = Kaizhi|last2 = Wang|first3 = Xingzhao|last3 = Liu|first4 = Wenxian|last4 = Yu}}</ref><ref>{{Cite journal|title =2009 Implementing2nd radarInternational algorithmsCongress on CUDAImage hardware|urland =Signal http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6872240Processing|journal chapter=An MixedEfficient DesignSAR ofProcessor IntegratedBased Circuitson SystemsGPU (MIXDES),via 2014CUDA|isbn=978-1-4244-4129-7|s2cid=18801932}}</ref><ref>{{cite Proceedings of the 21st International Conferencebook|date = 2014-06-01|pages = 455-458455–458|doi = 10.1109/MIXDES.2014.6872240|first first1= P.|last last1= Monsurro|first2 = A.|last2 = Trifiletti|first3 = F.|last3=Lannutti|title=2014 Proceedings of the 21st International Conference Mixed Design of Integrated Circuits and Systems (MIXDES)|chapter=Implementing Lannuttiradar algorithms on CUDA hardware|isbn=978-83-63578-05-3|s2cid=16482715}}</ref>. GPGPUs can be used to rapidly perform FFT and/or iFFT in this kind of applications.
Many [[self-driving cars]] applies 3-D image recognition techniques to auto control the vehicles. Clearly, to accommodate the fast changing exterior environment, the recognition and decision processes must be done in real-time. GPGPUs are excellent devices to achieve the goal<ref>{{Cite journal|title = Accelerating Cost Aggregation for Real-Time Stereo Matching|url = http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=6413661|journal = 2012 IEEE 18th International Conference on Parallel and Distributed Systems (ICPADS)|date = 2012-12-01|pages = 472-481|doi = 10.1109/ICPADS.2012.71|first = Jianbin|last = Fang|first2 = A.L.|last2 = Varbanescu|first3 = Jie|last3 = Shen|first4 = H.|last4 = Sips|first5 = G.|last5 = Saygili|first6 = L.|last6 = van der Maaten}}</ref>.
 
===Self-Drivingdriving Carcars===
===Medical Image Processing===
Many [[self-driving carscar]]s appliesapply 3-D image recognition techniques to auto control the vehicles. Clearly, to accommodate the fast changing exterior environment, the recognition and decision processes must be done in real-time. GPGPUs are excellent devices to achieve the goal.<ref>{{Citecite journal|title = Accelerating Cost Aggregation for Real-Time Stereo Matching|url = http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=6413661|journal = 2012 IEEE 18th International Conference on Parallel and Distributed Systems (ICPADS)book|date = 2012-12-01|pages = 472-481472–481|doi = 10.1109/ICPADS.2012.71|first first1= Jianbin|last last1= Fang|first2 = A.L.|last2 = Varbanescu|first3 = Jie|last3 = Shen|first4 = H.|last4 = Sips|first5 = G.|last5 = Saygili|first6 = L.|last6 = van der Maaten|title=2012 IEEE 18th International Conference on Parallel and Distributed Systems|chapter=Accelerating Cost Aggregation for Real-Time Stereo Matching|isbn=978-1-4673-4565-1|s2cid=14737126}}</ref>.
In order to have accurate diagnosis, 2-D or 3-D medical signals, such as [[ultrasound]], [[X-ray]], [[Magnetic resonance imaging|MRI]], and [[CT scan|CT]], often requires very high sampling rate and image resolutions to reconstruct images. By applying GPGPUs' superior computation power, it has been proved that we can acquire better quality on medical images<ref>{{Cite web|title = Medical Imaging{{!}}NVIDIA|url = http://www.nvidia.com/object/medical_imaging.html|website = www.nvidia.com|publisher = https://plus.google.com/104889184472622775891|accessdate = 2015-11-07}}</ref><ref>{{Cite journal|title = GPU-based Volume Rendering for Medical Image Visualization|url = http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=1615635&url=http%253A%252F%252Fieeexplore.ieee.org%252Fxpls%252Fabs_all.jsp%253Farnumber%253D1615635|journal = Engineering in Medicine and Biology Society, 2005. IEEE-EMBS 2005. 27th Annual International Conference of the|date = 2005-01-01|pages = 5145-5148|doi = 10.1109/IEMBS.2005.1615635|first = Yang|last = Heng|first2 = Lixu|last2 = Gu}}</ref>.
 
===Medical Imageimage Processingprocessing===
In order to have accurate diagnosis, 2-D or 3-D medical signals, such as [[ultrasound]], [[X-ray]], [[Magnetic resonance imaging|MRI]], and [[CT scan|CT]], often requiresrequire very high sampling rate and image resolutions to reconstruct images. By applying GPGPUs' superior computation power, it haswas been provedshown that we can acquire better -quality on medical images<ref>{{Citecite web|title = Medical Imaging{{!}}NVIDIA|url = http://www.nvidia.com/object/medical_imaging.html|website = www.nvidia.com|publisher access-date= https://plus.google.com/104889184472622775891|accessdate = 2015-11-07|language=en}}</ref><ref>{{Citecite journalbook|title volume= GPU5|date=2005-01-based Volume Rendering for Medical Image Visualization01|url pages= http:5145–5148|doi=10.1109//ieeexploreIEMBS.ieee2005.org/xpl/login.jsp?tp1615635|pmid=&arnumber17281405|first1=1615635&urlYang|last1=http%253A%252F%252Fieeexplore.ieee.org%252Fxpls%252Fabs_all.jsp%253Farnumber%253D1615635Heng|journal first2=Lixu|last2=Gu|title=2005 IEEE Engineering in Medicine and Biology Society, 2005. IEEE-EMBS 2005. 27th Annual International Conference of the|date chapter= 2005GPU-01-01|pagesbased =Volume 5145-5148|doiRendering =for 10.1109/IEMBS.2005.1615635|firstMedical =Image YangVisualization|last isbn= Heng978-0-7803-8741-6|first2 s2cid= Lixu|last2 = Gu17401263}}</ref>.
 
==References==
Line 192 ⟶ 198:
{{Parallel computing}}
 
[[:Category:Digital signal processingprocessors]]
[[:Category:Digital signal processorsprocessing]]
[[:Category:GPGPU]]
[[:Category:Parallel computing]]
 
{{AFC submission|||ts=20151112015249|u=Sing0512|ns=118}}