Revision as of 19:16, 19 February 2019 edit Citation bot (talk \| contribs) Bots 5,863,934 edits m Alter: template type, isbn. Add: pmid, volume, isbn. Removed parameters. \| You can use this bot yourself. Report bugs here. \| User-activated. ← Previous edit		Revision as of 13:58, 27 January 2020 edit undo Frap (talk \| contribs) Extended confirmed users, File movers, Pending changes reviewers, Rollbackers 35,588 edits MOS:HEAD Next edit →
Line 9: Processing multidimensional signals is a common problem in scientific research and/or engineering computations. Typically, a DSP problem's computation complexity grows exponentially with the number of dimensions. Notwithstanding, with a high degree of time and storage complexity, it is extremely difficult to process multidimensional signals in real-time. Although many fast algorithms (e.g. [[Fast Fourier transform\|FFT]]) have been proposed for 1-D DSP problems, they are still not efficient enough to be adapted in high dimensional DSP problems. Therefore, it is still hard to obtain the desired computation results with [[Digital signal processor\|digital signal processors (DSPs)]]. Hence, better algorithms and hardware architecture are needed to accelerate multidimensional DSP computations. == Existing ~~Approaches~~approaches == Practically, to accelerate multidimensional DSP, some common approaches have been proposed and developed in the past decades. === Lower ~~Sampling~~sampling ~~Rate~~rate === A makeshift to achieve a real-time requirement in multidimensional DSP applications is to use a lower sampling rate, which can efficiently reduce the number of samples to be processed at one time and thereby decrease the total processing time. However, this can lead to the aliasing problem due to the [[Nyquist–Shannon sampling theorem\|sampling theorem]] and poor-quality outputs. In some applications, such as military radars and medical images, we are eager to have highly precise and accurate results. In such cases, using a lower sampling rate to reduce the amount of computation in the multidimensional DSP ___domain is not always allowable. === Digital ~~Signal~~signal ~~Processors (DSPs)~~processors === Digital signal processors are designed specifically to process vector operations. They have been widely used in DSP computations for decades. However, most digital signal processors are only capable of manipulating a few operations in parallel. This kind of design is sufficient to accelerate audio processing (1-D signals) and image processing (2-D signals). However, with a large number of data samples of multidimensional signals, this is still not powerful enough to retrieve computation results in real-time. === Supercomputer ~~Assistance~~assistance === In order to accelerate multidimensional DSP computations, using dedicated [[supercomputer]]s or [[Computer cluster\|cluster computers]] is required in some circumstances, e.g., [[weather forecasting]] and military radars. Nevertheless, using supercomputers designated to simply perform DSP operations takes considerable money cost and energy consumption. Also, it is not practical and suitable for all multidimensional DSP applications. === GPU ~~Acceleration~~acceleration === [[Graphics processing unit\|GPUs]] are originally devised to accelerate image processing and video stream rendering. Moreover, since modern GPUs have good ability to perform numeric computations in parallel with a relatively low cost and better energy efficiency, GPUs are becoming a popular alternative to replace supercomputers performing multidimensional DSP.<ref>{{cite book\|title = OpenCL: Make Ubiquitous Supercomputing Possible\|journal = 2010 12th IEEE International Conference on High Performance Computing and Communications (HPCC)\|date = 2010-09-01\|pages = 556–561\|doi = 10.1109/HPCC.2010.56\|first = Slo-Li\|last = Chu\|first2 = Chih-Chieh\|last2 = Hsiao\|isbn = 978-1-4244-8335-8}}</ref> == GPGPU ~~Computations~~computations == [[File:SIMD GPGPU.jpg\|alt= Figure illustrating a SIMD/vector computation unit in GPGPUs..\|thumb\|GPGPU/SIMD computation model.]] Line 31: GPGPUs are able to perform an operation on multiple independent data concurrently with their vector or SIMD functional units. A modern GPGPU can spawn thousands of concurrent threads and process all threads in a batch manner. With this nature, GPGPUs can be employed as DSP accelerators easily while many DSP problems can be solved by [[Divide and conquer algorithms\|divide-and-conquer]] algorithms. A large scale and complex DSP problem can be divided into a bunch of small numeric problems and be processed altogether at one time so that the overall time complexity can be reduced significantly. For example, multiplying two {{math\|''M'' × ''M''}} matrices can be processed by {{math\|''M'' × ''M''}} concurrent threads on a GPGPU device without any output data dependency. Therefore, theoretically, by means of GPGPU acceleration, we can gain up to {{math\|''M'' × ''M''}} speedup compared with a traditional CPU or digital signal processor. == GPU ~~Programming~~programming ~~Languages~~languages == Currently, there are several existing programming languages or interfaces which support GPGPU programming. Line 42: The following figure illustrates the execution flow of launching an OpenCL program on a GPU device. The CPU first detects OpenCL devices (GPU in this case) and than invokes a just-in-time compiler to translate the OpenCL source code into target binary. CPU then sends data to GPU to perform computations. When the GPU is processing data, CPU is free to process its own tasks. === C++ ~~Amp~~AMP === [[C++ AMP~~\|C++ Amp~~]] is a programming model proposed by [[Microsoft]]. C++ ~~Amp~~AMP is a [[C++]] based library designed for programming SIMD processors<ref>{{cite web\|title = C++ AMP (C++ Accelerated Massive Parallelism)\|url = https://msdn.microsoft.com/en-us/library/hh265137.aspx\|website = msdn.microsoft.com\|accessdate = 2015-11-05}}</ref> === ~~OpenAcc~~OpenACC === [[OpenACC~~\|OpenAcc~~]] is a programming standard for parallel computing developed by [[Cray]], CAPS, [[Nvidia\|NVIDIA]] and PGI.<ref>{{cite web\|title = OpenACC Home {{!}} www.openacc.org\|url = http://www.openacc.org/\|website = www.openacc.org\|accessdate = 2015-11-05}}</ref> OpenAcc targets programming for CPU and GPU heterogeneous systems with [[C (programming language)\|C]], [[C++]], and [[Fortran]] extensions. == Examples of GPU ~~Programming~~programming for ~~Multidimensional~~multidimensional DSP == === {{math\|''m'' × ''m''}} ~~Matrix~~matrix ~~Multiplication~~multiplication === Suppose {{math\|'''A'''}} and {{math\|'''B'''}} are two {{math\|''m'' × ''m''}} matrices and we would like to compute {{math\|1 = '''C''' = '''A''' × '''B'''}}. Line 116: </syntaxhighlight> === Multidimensional ~~Convolution (M-D Convolution)~~convolution === Convolution is a frequently used operation in DSP. To compute the 2-D convolution of two ''m'' × ''m'' signals, it requires {{math\|''m''<sup>''2''</sup>}} multiplications and {{math\|''m'' × (''m'' – ''1'')}} additions for an output element. That is, the overall time complexity is ''Θ(n''<sup href="Category:GPGPU">''4''</sup>'')'' for the entire output signal''.'' As the following OpenCL example shows, with GPGPU acceleration, the total computation time effectively decreases to ''Θ(n''<sup href="Category:GPGPU">''2''</sup>'')'' since all output elements are data independent. Line 153: <math>y(n_1,n_2,...,n_s)=x(n_1,n_2,...,n_s)**h(n_1,n_2,...,n_s)=\sum_{k_1=0}^{m_1-1}\sum_{k_2=0}^{m_2-1}...\sum_{k_s=0}^{m_s-1}x(k_1, k_2,...,k_s)h(n_1-k_1,n_2-k_2,...,n_s-k_s)</math> === Multidimensional ~~Discrete~~discrete ~~Time~~time ~~Fourier~~fourier ~~Transform~~transform (M-D DTFT) === In addition to convolution, the [[Fourier transform\|discrete-time Fourier transform (DTFT)]] is another technique which is often used in system analysis. Line 178: </syntaxhighlight> == Real ~~Applications~~applications == === Digital ~~Filter~~filter ~~Design~~design === Designing a multidimensional digital filter is a big challenge, especially [[Infinite impulse response\|IIR]] filters. Typically it relies on computers to solve difference equations and obtain a set of approximated solutions. While GPGPU computation is becoming popular, several adaptive algorithms have been proposed to design multidimensional [[Finite impulse response\|FIR]] and/or [[Infinite impulse response\|IIR]] filters by means of GPGPUs.<ref>{{cite book\|title = GPU-efficient Recursive Filtering and Summed-area Tables\|publisher = ACM\|journal = Proceedings of the 2011 SIGGRAPH Asia Conference\|date = 2011-01-01\|___location = New York, NY, USA\|isbn = 978-1-4503-0807-6\|pages = 176:1–176:12\|series = SA '11\|doi = 10.1145/2024156.2024210\|first = Diego\|last = Nehab\|first2 = André\|last2 = Maximo\|first3 = Rodolfo S.\|last3 = Lima\|first4 = Hugues\|last4 = Hoppe}}</ref><ref>{{cite book\|title = GPU Gems 2: Programming Techniques For High-Performance Graphics And General-Purpose Computation\|last = Pharr\|first = Matt\|publisher = Pearson Addison Wesley\|year = 2005\|isbn = 978-0-321-33559-3\|___location = \|pages = \|last2 = Fernando\|first2 = Randima}}</ref><ref>{{cite book\|title = GPU Computing Gems Emerald Edition\|last = Hwu\|first = Wen-mei W.\|publisher = Morgan Kaufmann Publishers Inc.\|year = 2011\|isbn = 978-0-12-385963-1\|___location = San Francisco, CA, USA\|pages = }}</ref> === Radar ~~Signal~~signal ~~Reconstruction~~reconstruction and ~~Analysis~~analysis === Radar systems usually need to reconstruct numerous 3-D or 4-D data samples in real-time. Traditionally, particularly in military, this needs supercomputers' support. Nowadays, GPGPUs are also employed to replace supercomputers to process radar signals. For example, to process [[Synthetic aperture radar\|synthetic aperture radar (SAR)]] signals, it usually involves multidimensional [[Fast Fourier transform\|FFT]] computations.<ref>{{cite book\|title = Processing of synthetic Aperture Radar data with GPGPU\|journal = IEEE Workshop on Signal Processing Systems, 2009. SiPS 2009\|date = 2009-10-01\|pages = 309–314\|doi = 10.1109/SIPS.2009.5336272\|first = C.\|last = Clemente\|first2 = M.\|last2 = Di Bisceglie\|first3 = M.\|last3 = Di Santo\|first4 = N.\|last4 = Ranaldo\|first5 = M.\|last5 = Spinelli\|isbn = 978-1-4244-4335-2}}</ref><ref>{{cite book\|title = An Efficient SAR Processor Based on GPU via CUDA\|journal = 2nd International Congress on Image and Signal Processing, 2009. CISP '09\|date = 2009-10-01\|pages = 1–5\|doi = 10.1109/CISP.2009.5304418\|first = Bin\|last = Liu\|first2 = Kaizhi\|last2 = Wang\|first3 = Xingzhao\|last3 = Liu\|first4 = Wenxian\|last4 = Yu\|isbn = 978-1-4244-4129-7}}</ref><ref>{{cite book\|title = Implementing radar algorithms on CUDA hardware\|journal = Mixed Design of Integrated Circuits Systems (MIXDES), 2014 Proceedings of the 21st International Conference\|date = 2014-06-01\|pages = 455–458\|doi = 10.1109/MIXDES.2014.6872240\|first = P.\|last = Monsurro\|first2 = A.\|last2 = Trifiletti\|first3 = F.\|last3 = Lannutti\|isbn = 978-83-63578-05-3}}</ref> GPGPUs can be used to rapidly perform FFT and/or iFFT in this kind of applications. === Self-~~Driving~~driving ~~Car~~cars === Many [[self-driving ~~cars~~car]]s apply 3-D image recognition techniques to auto control the vehicles. Clearly, to accommodate the fast changing exterior environment, the recognition and decision processes must be done in real-time. GPGPUs are excellent devices to achieve the goal.<ref>{{cite book\|title = Accelerating Cost Aggregation for Real-Time Stereo Matching\|journal = 2012 IEEE 18th International Conference on Parallel and Distributed Systems (ICPADS)\|date = 2012-12-01\|pages = 472–481\|doi = 10.1109/ICPADS.2012.71\|first = Jianbin\|last = Fang\|first2 = A.L.\|last2 = Varbanescu\|first3 = Jie\|last3 = Shen\|first4 = H.\|last4 = Sips\|first5 = G.\|last5 = Saygili\|first6 = L.\|last6 = van der Maaten\|isbn = 978-1-4673-4565-1}}</ref> === Medical ~~Image~~image ~~Processing~~processing === In order to have accurate diagnosis, 2-D or 3-D medical signals, such as [[ultrasound]], [[X-ray]], [[Magnetic resonance imaging\|MRI]], and [[CT scan\|CT]], often require very high sampling rate and image resolutions to reconstruct images. By applying GPGPUs' superior computation power, it was shown that we can acquire better-quality medical images<ref>{{cite web\|title = Medical Imaging{{!}}NVIDIA\|url = http://www.nvidia.com/object/medical_imaging.html\|website = www.nvidia.com\|accessdate = 2015-11-07}}</ref><ref>{{cite book\|title = GPU-based Volume Rendering for Medical Image Visualization\|journal = Engineering in Medicine and Biology Society, 2005. IEEE-EMBS 2005. 27th Annual International Conference of the\|volume = 5\|date = 2005-01-01\|pages = 5145–5148\|doi = 10.1109/IEMBS.2005.1615635\|pmid = 17281405\|first = Yang\|last = Heng\|first2 = Lixu\|last2 = Gu\|isbn = 978-0-7803-8741-6}}</ref>

Multidimensional DSP with GPU acceleration: Difference between revisions