Neural processing unit

As of 2016, AI accelerators are an emerging class of microprocessor designed to accelerate artificial neural networks, machine vision and other machine learning algorithms for robotics, internet of things and other data-intensive/sensor driven tasks. They are frequently manycore designs (mirroring the massively-parallel nature of biological neural networks). They are targeted at practical Narrow AI applications, rather AGI research.

They are distinct from GPUs which are commonly used for the same role in that they lack any fixed function units for graphics, and generally focus on lower precision arithmetic.

History

Other architectures such as the Cell microprocessor have exhibited features significantly overlap with AI accelerators (support for packed low precision arithmetic, dataflow architecture, throughput over latency). One or more DSPs have also been used as neural network accelerators. The Physics processing unit was yet another example of an attempt to fill the gap between CPU and GPU in PC hardware, however physics tends to require 32bit precision and up, whilst much lower precision is optimal for AI.

Vendors of graphics processing units saw the opportunity and generalised their pipelines with specific support for GPGPU (which killed off the market for a dedicated physics accelerator, and superseded Cell in video game consoles, and led to their use in implementing convolutional neural networks such as AlexNet), as such as of 2016 most AI work is done on these. However at least a factor of 10 in efficiency^[1] can still be gained with an increasingly specific design. The memory access pattern of AI calculations differs from graphics, with more a more predictable but deeper dataflow ,rather than 'gather' from texture-maps & 'scatter' to frame buffers.

As of 2016, vendors are pushing their own terms, in the hope that their designs and APIs will dominate. In the past after graphics accelerators emerged, the industry eventually adopted NVidias self assigned term "GPU" as the collective noun for "graphics accelerators", which had settled on an overall pipeline patterned around Direct3D. There is no consensus on the boundary between these devices, nor the exact form they will take, however several examples clearly aim to fill this new space.

Examples

Vision processing units
- e.g. Movidius Myriad 2, which is a manycore VLIW AI accelerator at it's heart, complemented with video fixed function units.

Tensor processing unit - presented as an accelerator for Google's TensorFlow framework, which is extensively used for convolutional neural networks. Focusses on a high volume of 8-bit precision arithmetic.

SpiNNaker, a manycure design coming traditional ARM cores with an enhanced network fabric design specialised for simulating a large neural network.

TrueNorth The most unconventional example, a manycore design based on spiking neurons rather than traditional arithmetic. Frequency of pulses represents signal intensity. As of 2016 there is no consensus amongst AI researchers if this is the right way to go,^[2]but some results are promising, with large energy savings demonstrated for vision tasks.

Zeroth NPU a design by Qualcom aimed squarely at bringing speech and image recognition capabilities to mobile devices.

Adapteva epiphany is targeted as a coprocessor, featuring a network on a chip scratchpad memory model, suitable for a dataflow programming model as required for many machine learning tasks.

This article is a stub. You can help Wikipedia by expanding it.

^ "google boosts machine learning with TPU".mentions 10x efficiency
^ "yann lecun on IBM truenorth".

[1] "google boosts machine learning with TPU".mentions 10x efficiency

[2] "yann lecun on IBM truenorth".

[1]

[2]