Content deleted Content added
No edit summary |
|||
Line 26:
[[File:SIMD GPGPU.jpg|alt= Figure illustrating a SIMD/vector computation unit in GPGPUs..|thumb|GPGPU/SIMD computation model.]]
Modern GPU designs are mainly based on [[SIMD]] computation paradigm<ref>{{Cite journal|title = NVIDIA Tesla: A Unified Graphics and Computing Architecture|url = http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=4523358&url=http%253A%252F%252Fieeexplore.ieee.org%252Fxpls%252Fabs_all.jsp%253Farnumber%253D4523358|journal = IEEE Micro|date = 2008-03-01|issn = 0272-1732|pages = 39-55|volume = 28|issue = 2|doi = 10.1109/MM.2008.31|first = E.|last = Lindholm|first2 = J.|last2 = Nickolls|first3 = S.|last3 = Oberman|first4 = J.|last4 = Montrym}}</ref><ref>{{Cite book|title = Performance Analysis and Tuning for General Purpose Graphics Processing Units (GPGPU)|last = Kim|first = Hyesoon|publisher = Morgan & Claypool Publishers|year = 2012|isbn = 9781608459544|___location = |pages = |last2 = Vuduc|first2 = Richard|last3 = Baghsorkhi|first3 = Sara|last4 = Choi|first4 = Jee|last5 = Hwu|first5 = Wen-Mei W.|editor-last = Hill|editor-first = Mark D.|doi = 10.2200/S00451ED1V01Y201209CAC020}}</ref>. This type of GPU devices
GPGPUs are able to perform an operation on multiple independent data concurrently with their vector or SIMD functional units. A modern GPGPU can spawn thousands of concurrent threads and process all threads in a batch manner. With this nature, GPGPUs can be employed as DSP accelerators easily while many DSP problems can be solved by [[Divide and conquer algorithms|divide-and-conquer]] algorithms. A large scale and complex DSP problem can be divided into bunch of small numeric problems and be processed altogether at one time so that the overall time complexity can be reduced significantly. For example, multiplying two {{math|''M'' × ''M''}} matrices can be processed by {{math|''M'' × ''M''}} concurrent threads on a GPGPU device without any output data dependency. Therefore, theoretically, by means of GPGPU acceleration, we can gain up to {{math|''M'' × ''M''}} speedup compared with a traditional CPU or digital signal processor.
|