Neural processing unit: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 16:42, 16 June 2016 edit Fmadd (talk \| contribs) Extended confirmed users 10,468 edits →Examples ← Previous edit		Latest revision as of 20:06, 8 August 2025 edit undo Citation bot (talk \| contribs) Bots 5,872,770 edits Alter: title, template type. Add: chapter-url, chapter. Removed or converted URL. Removed parameters. Some additions/deletions were parameter name changes. \| Use this bot. Report bugs. \| Suggested by Headbomb \| #UCB_toolbar
(642 intermediate revisions by more than 100 users not shown)
Line 1: {{Short description\|Hardware acceleration unit for artificial intelligence tasks}} As of 2016, AI accelerators are an emerging class of [[microprocessor]] designed to accelerate [[artificial neural networks]], [[machine vision]] and other [[machine learning]] algorithms for robotics, internet of things and other data-intensive/sensor driven tasks. They are frequently [[manycore]] designs (mirroring the massively-parallel nature of biological neural networks). {{Use American English\|date=January 2019}} {{Use mdy dates\|date=October 2021}} A '''neural processing unit''' ('''NPU'''), also known as '''AI accelerator''' or '''deep learning processor''', is a class of specialized [[hardware acceleration\|hardware accelerator]]<ref>{{cite web \|url=https://www.v3.co.uk/v3-uk/news/3014293/intel-unveils-movidius-compute-stick-usb-ai-accelerator \|title=Intel unveils Movidius Compute Stick USB AI Accelerator \|date=July 21, 2017 \|access-date=August 11, 2017 \|archive-url=https://web.archive.org/web/20170811193632/https://www.v3.co.uk/v3-uk/news/3014293/intel-unveils-movidius-compute-stick-usb-ai-accelerator \|archive-date=August 11, 2017 }}</ref> or computer system<ref>{{cite web \|url=https://insidehpc.com/2017/06/inspurs-unveils-gx4-ai-accelerator/ \|title=Inspurs unveils GX4 AI Accelerator \|date=June 21, 2017}}</ref><ref>{{citation \|title=Neural Magic raises $15 million to boost AI inferencing speed on off-the-shelf processors \|last=Wiggers \|first=Kyle \|date=November 6, 2019 \|url=https://venturebeat.com/2019/11/06/neural-magic-raises-15-million-to-boost-ai-training-speed-on-off-the-shelf-processors/ \|publication-date=November 6, 2019 \|orig-date=2019 \|archive-url=https://web.archive.org/web/20200306120524/https://venturebeat.com/2019/11/06/neural-magic-raises-15-million-to-boost-ai-training-speed-on-off-the-shelf-processors/ \|archive-date=March 6, 2020 \|access-date=March 14, 2020}}</ref> designed to accelerate [[artificial intelligence]] (AI) and [[machine learning]] applications, including [[artificial neural network]]s and [[computer vision]]. ~~They are distinct from [[GPU]]s which are commonly used for the same role in that they lack any [[fixed function unit]]s for graphics, and generally focus on lower precision arithmetic.~~ ==~~= History =~~Use== Their purpose is either to efficiently execute already trained AI models (inference) or to train AI models. Their applications include [[algorithm]]s for [[robotics]], [[Internet of things]], and [[data (computing)\|data]]-intensive or sensor-driven tasks.<ref>{{cite web \|url=https://www.eetimes.com/google-designing-ai-processors/ \|title=Google Designing AI Processors\|date=May 18, 2016 }} Google using its own AI accelerators.</ref> They are often [[Manycore processor\|manycore]] or [[Spatial architecture\|spatial]] designs and focus on [[precision (computer science)\|low-precision]] arithmetic, novel [[dataflow architecture]]s, or [[in-memory computing]] capability. {{As of\|2024}}, a typical datacenter-grade AI [[integrated circuit]] chip, the H100 GPU, [[transistor count\|contains tens of billions]] of [[MOSFET]]s.<ref>{{cite web\|url=https://www.datacenterdynamics.com/en/news/nvidia-reveals-new-hopper-h100-gpu-with-80-billion-transistors/\|title=Nvidia reveals new Hopper H100 GPU, with 80 billion transistors\|last=Moss\|first=Sebastian\|date=2022-03-23\|website=Data Center Dynamics\|access-date=2024-01-30}}</ref> === Consumer devices === Other architectures such as the [[Cell microprocessor]] have exhibited features significantly overlap with AI accelerators (support for packed low precision arithmetic, dataflow architecture, throughput over latency). One or more [[DSP]]s have also been used as neural network accelerators. The [[Physics processing unit]] was yet another example of an attempt to fill the gap between [[CPU]] and GPU in PC hardware, however physics tends to require 32bit precision and up, whilst much lower precision is optimal for AI. Vendors of graphics processing units saw the opportunity and generalised their pipelines with specific support for [[GPGPU]] (which killed off the market for a dedicated physics accelerator, and superseded Cell in video game consoles), as such as of 2016 most AI work is done on these. However at least a factor of 10 in efficiency<ref>{{cite web\|title=google boosts machine learning with TPU\|url=http://techreport.com/news/30155/google-boosts-machine-learning-with-its-tensor-processing-unit}}mentions 10x efficiency</ref> can be gained with yet more dedicated designs. AI accelerators are used in mobile devices such as Apple [[iPhone]]s, AMD [[AI engine\|AI engines]]<ref>{{Cite book \|last=Brown \|first=Nick \|chapter=Exploring the Versal AI Engines for Accelerating Stencil-based Atmospheric Advection Simulation \|date=2023-02-12 \|title=Proceedings of the 2023 ACM/SIGDA International Symposium on Field Programmable Gate Arrays \|chapter-url=https://dl.acm.org/doi/10.1145/3543622.3573047 \|series=FPGA '23 \|___location=New York, NY, USA \|publisher=Association for Computing Machinery \|pages=91–97 \|doi=10.1145/3543622.3573047 \|isbn=978-1-4503-9417-8\|arxiv=2301.13016 }}</ref> in Versal and NPUs, [[Huawei]], and [[Google Pixel]] smartphones,<ref>{{Cite web\|url=https://consumer.huawei.com/en/press/news/2017/ifa2017-kirin970\|title=HUAWEI Reveals the Future of Mobile AI at IFA}}</ref> and seen in many [[Apple silicon]], [[Qualcomm]], [[Samsung]], and [[Google Tensor]] smartphone processors.<ref>{{Cite web\| title=Snapdragon 8 Gen 3 mobile platform \| url=https://docs.qualcomm.com/bundle/publicresource/87-71408-1_REV_B_Snapdragon_8_gen_3_Mobile_Platform_Product_Brief.pdf \| archive-url=https://web.archive.org/web/20231025162610/https://docs.qualcomm.com/bundle/publicresource/87-71408-1_REV_B_Snapdragon_8_gen_3_Mobile_Platform_Product_Brief.pdf \| archive-date=2023-10-25}}</ref> It is more recently (circa 2022) added to computer processors from [[Intel]],<ref>{{Cite web\|url=https://www.intel.com/content/www/us/en/newsroom/news/intels-lunar-lake-processors-arriving-q3-2024.html\|title=Intel's Lunar Lake Processors Arriving Q3 2024\|website=Intel\|date=May 20, 2024 }}</ref> [[AMD]],<ref>{{cite web\|title=AMD XDNA Architecture\|url=https://www.amd.com/en/technologies/xdna.html}}</ref> and Apple silicon.<ref>{{Cite web \|title=Deploying Transformers on the Apple Neural Engine \|url=https://machinelearning.apple.com/research/neural-engine-transformers \|access-date=2023-08-24 \|website=Apple Machine Learning Research \|language=en-US}}</ref> All models of Intel [[Meteor Lake]] processors have a built-in ''versatile processor unit'' (''VPU'') for accelerating [[statistical inference\|inference]] for computer vision and deep learning.<ref>{{Cite web\|url=https://www.pcmag.com/news/intel-to-bring-a-vpu-processor-unit-to-14th-gen-meteor-lake-chips\|title=Intel to Bring a 'VPU' Processor Unit to 14th Gen Meteor Lake Chips\|website=PCMAG\|date=August 2022 }}</ref> As of 2016, vendors are pushing their own terms, in the hope that their designs and [[API]]s will dominate. In the past after [[graphics accelerator]]s emerged, the industry eventually adopted [[NVidia]]s self assigned term "[[GPU]]" as the collective noun for "graphics accelerators", which had settled on an overall pipeline patterned around [[Direct3D]]. There is no consensus on the boundary between these devices, nor the exact form they will take, however several examples clearly aim to fill this new space. On consumer devices, the NPU is intended to be small, power-efficient, but reasonably fast when used to run small models. To do this they are designed to support low-bitwidth operations using data types such as INT4, INT8, FP8, and FP16. A common metric is trillions of operations per second (TOPS), though this metric alone does not quantify which kind of operations are being done.<ref>{{cite web \|title=A guide to AI TOPS and NPU performance metrics \|url=https://www.qualcomm.com/news/onq/2024/04/a-guide-to-ai-tops-and-npu-performance-metrics}}</ref> === Examples ===▼ ▲=== ~~Examples~~Datacenters === * [[Vision processing unit]]s Accelerators are used in [[cloud computing]] servers, including [[tensor processing unit]]s (TPU) in [[Google Cloud Platform]]<ref>{{Cite journal\|date=2017-06-24\|title=In-Datacenter Performance Analysis of a Tensor Processing Unit\|journal=ACM SIGARCH Computer Architecture News\|volume=45\|issue=2\|pages=1–12\|language=EN\|doi=10.1145/3140659.3080246\|doi-access=free \|last1=Jouppi \|first1=Norman P. \|last2=Young \|first2=Cliff \|last3=Patil \|first3=Nishant \|last4=Patterson \|first4=David \|last5=Agrawal \|first5=Gaurav \|last6=Bajwa \|first6=Raminder \|last7=Bates \|first7=Sarah \|last8=Bhatia \|first8=Suresh \|last9=Boden \|first9=Nan \|last10=Borchers \|first10=Al \|last11=Boyle \|first11=Rick \|last12=Cantin \|first12=Pierre-luc \|last13=Chao \|first13=Clifford \|last14=Clark \|first14=Chris \|last15=Coriell \|first15=Jeremy \|last16=Daley \|first16=Mike \|last17=Dau \|first17=Matt \|last18=Dean \|first18=Jeffrey \|last19=Gelb \|first19=Ben \|last20=Ghaemmaghami \|first20=Tara Vazir \|last21=Gottipati \|first21=Rajendra \|last22=Gulland \|first22=William \|last23=Hagmann \|first23=Robert \|last24=Ho \|first24=C. Richard \|last25=Hogberg \|first25=Doug \|last26=Hu \|first26=John \|last27=Hundt \|first27=Robert \|last28=Hurt \|first28=Dan \|last29=Ibarz \|first29=Julian \|last30=Jaffey \|first30=Aaron \|display-authors=1 \|arxiv=1704.04760 }}</ref> and [[Trainium]] and [[Inferentia]] chips in [[Amazon Web Services]].<ref>{{cite web \| title = How silicon innovation became the 'secret sauce' behind AWS's success\| website = Amazon Science\| date = July 27, 2022\| url = https://www.amazon.science/how-silicon-innovation-became-the-secret-sauce-behind-awss-success\| access-date = July 19, 2024}}</ref> Many vendor-specific terms exist for devices in this category, and it is an [[emerging technologies\|emerging technology]] without a [[dominant design]]. ** e.g. [[Movidius Myriad 2]], which is a manycore [[VLIW]] AI accelerator at it's heart, complemented with video [[fixed function unit]]s. [[Graphics processing units]] designed by companies such as [[Nvidia]] and [[AMD]] often include AI-specific hardware, and are commonly used as AI accelerators, both for [[Machine learning\|training]] and [[Inference engine\|inference]].<ref>{{cite web\| last1 = Patel\| first1 = Dylan\| last2 = Nishball\| first2 = Daniel\| last3 = Xie\| first3 = Myron\| title = Nvidia's New China AI Chips Circumvent US Restrictions\| url=https://www.semianalysis.com/p/nvidias-new-china-ai-chips-circumvent\| website = SemiAnalysis\| date=2023-11-09\| access-date=2024-02-07}}</ref> * [[Tensor processing unit]] - presented as an accelerator for Google's [[TensorFlow]] framework, which is extensively used for [[convolutional neural network]]s. Focusses on a high volume of [[8-bit]] precision arithmetic. == Programming == * [[SpiNNaker]], a manycure design coming traditional [[ARM]] cores with an enhanced network fabric design specialised for simulating a large neural network. Mobile NPU vendors typically provide their own [[application programming interface]] such as the Snapdragon Neural Processing Engine. An operating system or a higher-level library may provide a more generic interface such as TensorFlow Lite with LiteRT Next (Android) or CoreML (iOS, macOS). Consumer CPU-integrated NPUs are accessible through vendor-specific APIs. AMD (Ryzen AI), Intel (OpenVINO), Apple silicon (CoreML){{efn\|MLX builds atop the CPU and GPU parts, not the Apple Neural Engine (ANE) part of Apple Silicon chips. The relatively good performance is due to the use of a large, fast [[unified memory]] design.}} each have their own APIs, which can be built upon by a higher-level library. * [[TrueNorth]] The most unconventional example, a manycore design based on [[spiking neuron]]s rather than traditional arithmetic. Frequency of pulses represents signal intensity. As of 2016 there is no consensus amongst AI researchers if this is the right way to go,<ref>{{cite web\|title=yann lecun on IBM truenorth\|url=https://www.facebook.com/yann.lecun/posts/10152184295832143}}</ref>but some results are promising, with large energy savings demonstrated for vision tasks. GPUs generally use existing [[GPGPU]] pipelines such as CUDA and OpenCL adapted for lower precisions. Custom-built systems such as the Google TPU use private interfaces. * [[Zeroth NPU]] a design by Qualcom aimed squarely at bringing speech and image recognition capabilities to mobile devices. ==Notes== * [[Adapteva epiphany]] is targeted as a [[coprocessor]], featuring a [[network on a chip]] [[scratchpad memory]] model, suitable for a dataflow programming model as required for many machine learning tasks. {{notelist}} == References == ~~{{stub}}~~ {{Reflist}} == External links == [https://www.nextplatform.com/2016/04/05/nvidia-puts-accelerator-metal-pascal/ Nvidia Puts The Accelerator To The Metal With Pascal], The Next Platform [http://eyeriss.mit.edu/ Eyeriss Project], MIT {{Hardware acceleration}} [[Category:Application-specific integrated circuits]] [[Category:Neural processing units\| ]] [[Category:Coprocessors]] [[Category:Computer optimization]] [[Category:Gate arrays]] [[Category:Deep learning]]