Neural processing unit: Difference between revisions

Content deleted Content added
No edit summary
Line 5:
=== History ===
 
Other architectures such as the [[Cell microprocessor]] have exhibited features significantly overlap with AI accelerators (support for packed low precision arithmetic, dataflow architecture, throughput over latency). One or more [[DSP]]s have also been used as neural network accelerators<ref>{{cite web|convolutional neural network demo from 1993 featuring DSP32 accelerator|url=https://www.youtube.com/watch?v=FwFduRA_L6Q}}</ref>.
The [[Physics processing unit]] was yet another example of an attempt to fill the gap between [[CPU]] and GPU in PC hardware, however physics tends to require 32bit precision and up, whilst much lower precision is optimal for AI.
<ref>{{cite web|convolutional neural network demo from 1993 featuring DSP32 accelerator|url=https://www.youtube.com/watch?v=FwFduRA_L6Q}}</ref>
.
The [[Physics processing unit]] was yet another example of an attempt to fill the gap between [[CPU]] and GPU in PC hardware, however physics tends to require 32bit precision and up, whilst much lower precision is optimal for AI.
 
Vendors of graphics processing units saw the opportunity and generalised their pipelines with specific support for [[GPGPU]] (which killed off the market for a dedicated physics accelerator, and superseded Cell in video game consoles, and led to their use in implementing [[convolutional neural network]]s such as [[AlexNet]]), as such as of 2016 most AI work is done on these. However at least a factor of 10 in efficiency<ref>{{cite web|title=google boosts machine learning with TPU|url=http://techreport.com/news/30155/google-boosts-machine-learning-with-its-tensor-processing-unit}}mentions 10x efficiency</ref> can still be gained with an increasingly specific design. The [[memory access pattern]] of AI calculations differs from graphics, with more a more predictable but deeper [[dataflow]] ,rather than 'gather' from texture-maps & 'scatter' to frame buffers.