Revision as of 00:35, 13 November 2015 edit Wiae (talk \| contribs) Autopatrolled, Extended confirmed users, Pending changes reviewers, Rollbackers 77,873 edits m cleanup using autoFormatter ← Previous edit		Revision as of 05:29, 13 November 2015 edit undo Yobot (talk \| contribs) Bots 4,733,870 edits m WP:CHECKWIKI error fixes, added orphan tag using AWB (11749) Next edit →
Line 1: {{Orphan\|date=November 2015}} [[Digital signal processing]] (DSP) is a ubiquitous methodology in scientific and engineering computations. However, practically, to DSP problems are often not only 1-D. For instance, image data are 2-D signals and radar signals are 3-D signals. While the number of dimension increases, the time and/or storage complexity of processing digital signal grows dramatically. Therefore, solving DSP problems in real-time is extremely difficult in reality. Modern [[General-purpose computing on graphics processing units\|general purpose graphics processing units (GPGPUs)]] are considered having excellent throughput on vector operations and numeric manipulations by high degree of parallel computation. While processing digital signals, particularly multidimensional signals, often involves in a series of vector operations on massive amount of independent data samples, GPGPUs are now widely employed to accelerate multidimensional DSP, such as [[image processing]], [[Video processing\|video codec]], [[Radar signal characteristics\|radar signal analysis]], [[sonar signal processing]], and [[~~Ultrasound~~ultrasound scan~~\|ultrasound scanning~~]]ning. Conceptually, using GPGPU devices to perform multidimensional DSP is able to dramatically reduce the computation complexity compared with [[Cpu\|central processing units (CPUs)]], [[Digital signal processor\|digital signal processors (DSPs)]], or other [[Field-programmable gate array\|FPGA]] accelerators. == Motivation == Line 16 ⟶ 18: === Supercomputer Assistance === In order to accelerate multidimensional DSP computations, using dedicated [[~~Supercomputer\|supercomputers~~supercomputer]]s or [[Computer cluster\|cluster computers]] is required in some circumstances, e.g., [[weather forecasting]] and military radars. Nevertheless, using supercomputers designated to simply perform DSP operations takes considerable money cost and energy consumption. It is not practical and suitable for all multidimensional DSP applications. === GPU Acceleration === [[Graphics processing unit\|GPUs]] are originally devised to accelerate image processing and video stream rendering. Moreover, since modern GPUs' have good ability to perform numeric computations in parallel with a relatively low cost and better energy efficiency, GPUs are becoming a popular alternative to replace supercomputers performing multidimensional DSP.<ref>{{cite journal\|title = OpenCL: Make Ubiquitous Supercomputing Possible\|url = http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5581488&tag=1\|journal = 2010 12th IEEE International Conference on High Performance Computing and Communications (HPCC)\|date = 2010-09-01\|pages = ~~556-561~~556–561\|doi = 10.1109/HPCC.2010.56\|first = Slo-Li\|last = Chu\|first2 = Chih-Chieh\|last2 = Hsiao}}</ref> == GPGPU Computations == [[File:SIMD GPGPU.jpg\|alt= Figure illustrating a SIMD/vector computation unit in GPGPUs..\|thumb\|GPGPU/SIMD computation model.]] Modern GPU designs are mainly based on [[SIMD]] computation paradigm.<ref>{{cite journal\|title = NVIDIA Tesla: A Unified Graphics and Computing Architecture\|url = http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=4523358&url=http%253A%252F%252Fieeexplore.ieee.org%252Fxpls%252Fabs_all.jsp%253Farnumber%253D4523358\|journal = IEEE Micro\|date = 2008-03-01\|issn = 0272-1732\|pages = ~~39-55~~39–55\|volume = 28\|issue = 2\|doi = 10.1109/MM.2008.31\|first = E.\|last = Lindholm\|first2 = J.\|last2 = Nickolls\|first3 = S.\|last3 = Oberman\|first4 = J.\|last4 = Montrym}}</ref><ref>{{cite book\|title = Performance Analysis and Tuning for General Purpose Graphics Processing Units (GPGPU)\|last = Kim\|first = Hyesoon\|publisher = Morgan & Claypool Publishers\|year = 2012\|isbn = 978-1-60845-954-4\|___location = \|pages = \|last2 = Vuduc\|first2 = Richard\|last3 = Baghsorkhi\|first3 = Sara\|last4 = Choi\|first4 = Jee\|last5 = Hwu\|first5 = Wen-Mei W.\|editor-last = Hill\|editor-first = Mark D.\|doi = 10.2200/S00451ED1V01Y201209CAC020}}</ref>. This type of GPU devices is so-called [[General-purpose computing on graphics processing units\|general-purpose GPUs (GPGPUs)]]. GPGPUs are able to perform an operation on multiple independent data concurrently with their vector or SIMD functional units. A modern GPGPU can spawn thousands of concurrent threads and process all threads in a batch manner. With this nature, GPGPUs can be employed as DSP accelerators easily while many DSP problems can be solved by [[Divide and conquer algorithms\|divide-and-conquer]] algorithms. A large scale and complex DSP problem can be divided into bunch of small numeric problems and be processed altogether at one time so that the overall time complexity can be reduced significantly. For example, multiplying two {{math\|''M'' × ''M''}} matrices can be processed by {{math\|''M'' × ''M''}} concurrent threads on a GPGPU device without any output data dependency. Therefore, theoretically, by means of GPGPU acceleration, we can gain up to {{math\|''M'' × ''M''}} speedup compared with a traditional CPU or digital signal processor. Line 32 ⟶ 34: === CUDA === [[CUDA]] is the standard interface to program [[Nvidia\|NVIDIA]] GPUs. NVIDIA also provides many CUDA libraries to support DSP acceleration on NVIDIA GPU devices.<ref>{{cite web\|title = Parallel Programming and Computing Platform {{!}} CUDA {{!}} NVIDIA {{!}} NVIDIA\|url = http://www.nvidia.com/object/cuda_home_new.html\|website = www.nvidia.com\|publisher = https://plus.google.com/104889184472622775891\|accessdate = 2015-11-05}}</ref>. === OpenCL === [[OpenCL]] is an industrial standard which was originally proposed by [[Apple Inc.]] and is maintained and developed by [[Khronos Group]] now.<ref>{{cite web\|title = OpenCL – The open standard for parallel programming of heterogeneous systems\|url = https://www.khronos.org/opencl/\|website = www.khronos.org\|accessdate = 2015-11-05}}</ref>. OpenCL provides [[C++]] like [[Application programming interface\|APIs]] for programming different devices universally, including GPGPUs. [[File:OpenCL program execution flow.jpg\|alt=Illustrating the execution flow of a OpenCL program/kernel\|thumb\|OpenCL program execution flow\|285x285px]] Line 42 ⟶ 44: === OpenAcc === [[OpenACC\|OpenAcc]] is a programming standard for parallel computing developed by [[Cray]], CAPS, [[Nvidia\|NVIDIA]] and PGI.<ref>{{cite web\|title = OpenACC Home {{!}} www.openacc.org\|url = http://www.openacc.org/\|website = www.openacc.org\|accessdate = 2015-11-05}}</ref>. OpenAcc targets to program for CPU and GPU heterogeneous systems with [[C (programming language)\|C]], [[C++]], and [[Fortran]] extensions. == Examples of GPU Programming for Multidimensional DSP == === {{math\|''m'' × ''m''}} Matrix Multiplication === Suppose {{math\|'''A'''}} and {{math\|'''B'''}} are two {{math\|''m'' × ''m''}} matrices and we would like to compute {{math\|1 = '''C''' = '''A''' × '''B'''}}. Line 149 ⟶ 152: <math>X(\Omega_1,\Omega_2,...,\Omega_s)=\sum_{n_1=0}^{m_1-1}\sum_{n_2=0}^{m_2-1}...\sum_{n_s=0}^{m_s-1}x(n_1, n_2,...,n_s)e^{-j(\Omega_1n_1+\Omega_1n_1+...+\Omega_sn_s)}</math> Practically, to implement M-D DTFT, if assuming the system is separable, we can perform M times 1-D DFTF and matrix transpose with respect to each dimension. With a 1-D FFT operation, GPGPU can conceptually reduce the complexity from ''Θ(n''<sup href="Category:GPGPU">''2''</sup>'')'' to Θ(n'')'' by the following example of OpenCL implementation''.'' That is, to perform M-D DTFT, the complexity of GPGPU can be achieved by ''Θ(n''<sup href="Category:GPGPU">''2''</sup>'').'' While some GPGPUs' are also equipped hardware FFT accelerator internally, this implementation might be also optimized by invoking the FFT APIs or libraries provided by GPU manufactures.<ref>{{cite web\|title = OpenCL™ Optimization Case Study Fast Fourier Transform – Part II – AMD\|url = http://developer.amd.com/resources/documentation-articles/articles-whitepapers/opencl-optimization-case-study-fast-fourier-transform-part-ii/\|website = AMD\|accessdate = 2015-11-05\|language = en-US}}</ref>.<syntaxhighlight lang="c++" line="1"> // DTFT in OpenCL __kernel void convolution( Line 170 ⟶ 173: == Real Applications == === Digital Filter Design === Designing a digital filter in multidimensional area is a big challenge, especially [[Infinite impulse response\|IIR]] filters. Typically it relies on computers to solve difference equations and obtain a set of approximated solutions. While GPGPU computation is becoming popular, several adaptive algorithms have been proposed to design multidimensional [[Finite impulse response\|FIR]] and/or [[Infinite impulse response\|IIR]] filters by means of GPGPUs.<ref>{{cite journal\|title = GPU-efficient Recursive Filtering and Summed-area Tables\|url = http://doi.acm.org/10.1145/2024156.2024210\|publisher = ACM\|journal = Proceedings of the 2011 SIGGRAPH Asia Conference\|date = 2011-01-01\|___location = New York, NY, USA\|isbn = 978-1-4503-0807-6\|pages = 176:1–176:12\|series = SA '11\|doi = 10.1145/2024156.2024210\|first = Diego\|last = Nehab\|first2 = André\|last2 = Maximo\|first3 = Rodolfo S.\|last3 = Lima\|first4 = Hugues\|last4 = Hoppe}}</ref><ref>{{cite book\|title = GPU Gems 2: Programming Techniques For High-Performance Graphics And General-Purpose Computation\|last = Pharr\|first = Matt\|publisher = Pearson Addison Wesley\|year = 2005\|isbn = 0-321-33559-7\|___location = \|pages = \|last2 = Fernando\|first2 = Randima}}</ref><ref>{{cite book\|title = GPU Computing Gems Emerald Edition\|last = Hwu\|first = Wen-mei W.\|publisher = Morgan Kaufmann Publishers Inc.\|year = 2011\|isbn = 0-12-385963-8\|___location = San Francisco, CA, USA\|pages = }}</ref>. === Radar Signal Reconstruction and Analysis === Radar systems usually require to reconstruct a numerous amount of 3-D or 4-D data samples in real-time. Traditionally, particularly in military, this needs supercomputers' support. Nowadays, GPGPUs are also employed to replace supercomputers to process radar signals. For example, to process [[Synthetic aperture radar\|synthetic aperture radar (SAR)]] signals, it usually involves multidimensional [[Fast Fourier transform\|FFT]] computations.<ref>{{cite journal\|title = Processing of synthetic Aperture Radar data with GPGPU\|url = http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=5336272&newsearch=true&queryText=gpgpu%2520radar\|journal = IEEE Workshop on Signal Processing Systems, 2009. SiPS 2009\|date = 2009-10-01\|pages = ~~309-314~~309–314\|doi = 10.1109/SIPS.2009.5336272\|first = C.\|last = Clemente\|first2 = M.\|last2 = Di Bisceglie\|first3 = M.\|last3 = Di Santo\|first4 = N.\|last4 = Ranaldo\|first5 = M.\|last5 = Spinelli}}</ref><ref>{{cite journal\|title = An Efficient SAR Processor Based on GPU via CUDA\|url = http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5304418\|journal = 2nd International Congress on Image and Signal Processing, 2009. CISP '09\|date = 2009-10-01\|pages = ~~1-5~~1–5\|doi = 10.1109/CISP.2009.5304418\|first = Bin\|last = Liu\|first2 = Kaizhi\|last2 = Wang\|first3 = Xingzhao\|last3 = Liu\|first4 = Wenxian\|last4 = Yu}}</ref><ref>{{cite journal\|title = Implementing radar algorithms on CUDA hardware\|url = http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=6872240\|journal = Mixed Design of Integrated Circuits Systems (MIXDES), 2014 Proceedings of the 21st International Conference\|date = 2014-06-01\|pages = ~~455-458~~455–458\|doi = 10.1109/MIXDES.2014.6872240\|first = P.\|last = Monsurro\|first2 = A.\|last2 = Trifiletti\|first3 = F.\|last3 = Lannutti}}</ref>. GPGPUs can be used to rapidly perform FFT and/or iFFT in this kind of applications. === Self-Driving Car === Many [[self-driving cars]] applies 3-D image recognition techniques to auto control the vehicles. Clearly, to accommodate the fast changing exterior environment, the recognition and decision processes must be done in real-time. GPGPUs are excellent devices to achieve the goal.<ref>{{cite journal\|title = Accelerating Cost Aggregation for Real-Time Stereo Matching\|url = http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=6413661\|journal = 2012 IEEE 18th International Conference on Parallel and Distributed Systems (ICPADS)\|date = 2012-12-01\|pages = ~~472-481~~472–481\|doi = 10.1109/ICPADS.2012.71\|first = Jianbin\|last = Fang\|first2 = A.L.\|last2 = Varbanescu\|first3 = Jie\|last3 = Shen\|first4 = H.\|last4 = Sips\|first5 = G.\|last5 = Saygili\|first6 = L.\|last6 = van der Maaten}}</ref>. === Medical Image Processing === In order to have accurate diagnosis, 2-D or 3-D medical signals, such as [[ultrasound]], [[X-ray]], [[Magnetic resonance imaging\|MRI]], and [[CT scan\|CT]], often requires very high sampling rate and image resolutions to reconstruct images. By applying GPGPUs' superior computation power, it has been proved that we can acquire better quality on medical images<ref>{{cite web\|title = Medical Imaging{{!}}NVIDIA\|url = http://www.nvidia.com/object/medical_imaging.html\|website = www.nvidia.com\|publisher = https://plus.google.com/104889184472622775891\|accessdate = 2015-11-07}}</ref><ref>{{cite journal\|title = GPU-based Volume Rendering for Medical Image Visualization\|url = http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=1615635&url=http%253A%252F%252Fieeexplore.ieee.org%252Fxpls%252Fabs_all.jsp%253Farnumber%253D1615635\|journal = Engineering in Medicine and Biology Society, 2005. IEEE-EMBS 2005. 27th Annual International Conference of the\|date = 2005-01-01\|pages = ~~5145-5148~~5145–5148\|doi = 10.1109/IEMBS.2005.1615635\|first = Yang\|last = Heng\|first2 = Lixu\|last2 = Gu}}</ref> == References == Line 191 ⟶ 195: [[Category:Digital signal processors]] [[Category:Digital signal processing]] ~~[[Category:Digital signal processors]]~~ [[Category:GPGPU]] ~~[[Category:Parallel computing]]~~ [[Category:Parallel computing]]

Multidimensional DSP with GPU acceleration: Difference between revisions