Neural processing unit: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 12:06, 7 May 2025 edit David Gerard (talk \| contribs) Edit filter managers, Extended confirmed users 214,591 edits took out non-NPU bits ← Previous edit		Latest revision as of 20:06, 8 August 2025 edit undo Citation bot (talk \| contribs) Bots 5,870,985 edits Alter: title, template type. Add: chapter-url, chapter. Removed or converted URL. Removed parameters. Some additions/deletions were parameter name changes. \| Use this bot. Report bugs. \| Suggested by Headbomb \| #UCB_toolbar
(20 intermediate revisions by 14 users not shown)
Line 3: {{Use mdy dates\|date=October 2021}} A '''neural processing unit''' ('''NPU'''), also known as '''AI accelerator''' or '''deep learning processor,''', is a class of specialized [[hardware acceleration\|hardware accelerator]]<ref>{{cite web \|url=https://www.v3.co.uk/v3-uk/news/3014293/intel-unveils-movidius-compute-stick-usb-ai-accelerator \|title=Intel unveils Movidius Compute Stick USB AI Accelerator \|date=July 21, 2017 \|access-date=August 11, 2017 \|archive-url=https://web.archive.org/web/20170811193632/https://www.v3.co.uk/v3-uk/news/3014293/intel-unveils-movidius-compute-stick-usb-ai-accelerator \|archive-date=August 11, 2017 }}</ref> or computer system<ref>{{cite web \|url=https://insidehpc.com/2017/06/inspurs-unveils-gx4-ai-accelerator/ \|title=Inspurs unveils GX4 AI Accelerator \|date=June 21, 2017}}</ref><ref>{{citation \|title=Neural Magic raises $15 million to boost AI inferencing speed on off-the-shelf processors \|last=Wiggers \|first=Kyle \|date=November 6, 2019 \|url=https://venturebeat.com/2019/11/06/neural-magic-raises-15-million-to-boost-ai-training-speed-on-off-the-shelf-processors/ \|publication-date=November 6, 2019 \|orig-date=2019 \|archive-url=https://web.archive.org/web/20200306120524/https://venturebeat.com/2019/11/06/neural-magic-raises-15-million-to-boost-ai-training-speed-on-off-the-shelf-processors/ \|archive-date=March 6, 2020 \|access-date=March 14, 2020}}</ref> designed to accelerate [[artificial intelligence]] (AI) and [[machine learning]] applications, including [[artificial neural network]]s and [[computer vision]]. ==Use== They can be used either to efficiently execute already trained AI models (inference) or for training AI models. Typical applications include [[algorithm]]s for [[robotics]], [[Internet of Things]], and other [[data (computing)\|data]]-intensive or sensor-driven tasks.<ref>{{cite web \|url=https://www.eetimes.com/google-designing-ai-processors/ \|title=Google Designing AI Processors\|date=May 18, 2016 }} Google using its own AI accelerators.</ref> They are often [[Manycore processor\|manycore]] designs and generally focus on [[precision (computer science)\|low-precision]] arithmetic, novel [[dataflow architecture]]s or [[in-memory computing]] capability. {{As of\|2024}}, a typical AI [[integrated circuit]] chip [[transistor count\|contains tens of billions]] of [[MOSFET]]s.<ref>{{cite web\|url=https://www.datacenterdynamics.com/en/news/nvidia-reveals-new-hopper-h100-gpu-with-80-billion-transistors/\|title=Nvidia reveals new Hopper H100 GPU, with 80 billion transistors\|last=Moss\|first=Sebastian\|date=2022-03-23\|website=Data Center Dynamics\|access-date=2024-01-30}}</ref> Their purpose is either to efficiently execute already trained AI models (inference) or to train AI models. Their applications include [[algorithm]]s for [[robotics]], [[Internet of things]], and [[data (computing)\|data]]-intensive or sensor-driven tasks.<ref>{{cite web \|url=https://www.eetimes.com/google-designing-ai-processors/ \|title=Google Designing AI Processors\|date=May 18, 2016 }} Google using its own AI accelerators.</ref> They are often [[Manycore processor\|manycore]] or [[Spatial architecture\|spatial]] designs and focus on [[precision (computer science)\|low-precision]] arithmetic, novel [[dataflow architecture]]s, or [[in-memory computing]] capability. {{As of\|2024}}, a typical datacenter-grade AI [[integrated circuit]] chip, the H100 GPU, [[transistor count\|contains tens of billions]] of [[MOSFET]]s.<ref>{{cite web\|url=https://www.datacenterdynamics.com/en/news/nvidia-reveals-new-hopper-h100-gpu-with-80-billion-transistors/\|title=Nvidia reveals new Hopper H100 GPU, with 80 billion transistors\|last=Moss\|first=Sebastian\|date=2022-03-23\|website=Data Center Dynamics\|access-date=2024-01-30}}</ref> === Consumer devices === AI accelerators are used in mobile devices such as Apple [[iPhone]]s and [[Huawei]] cellphones,<ref>{{Cite web\|url=https://consumer.huawei.com/en/press/news/2017/ifa2017-kirin970\|title=HUAWEI Reveals the Future of Mobile AI at IFA}}</ref> and personal computers such as [[Intel]] laptops,<ref>{{Cite web\|url=https://www.intel.com/content/www/us/en/newsroom/news/intels-lunar-lake-processors-arriving-q3-2024.html\|title=Intel's Lunar Lake Processors Arriving Q3 2024\|website=Intel\|date=May 20, 2024 }}</ref> [[AMD]] laptops<ref>{{cite web\|title=AMD XDNA Architecture\|url=https://www.amd.com/en/technologies/xdna.html}}</ref> and [[Apple silicon]] [[Mac (computer)\|Macs]].<ref>{{Cite web \|title=Deploying Transformers on the Apple Neural Engine \|url=https://machinelearning.apple.com/research/neural-engine-transformers \|access-date=2023-08-24 \|website=Apple Machine Learning Research \|language=en-US}}</ref> Accelerators are used in [[cloud computing]] servers, including [[tensor processing unit]]s (TPU) in [[Google Cloud Platform]]<ref>{{Cite journal\|date=2017-06-24\|title=In-Datacenter Performance Analysis of a Tensor Processing Unit\|journal=ACM SIGARCH Computer Architecture News\|volume=45\|issue=2\|pages=1–12\|language=EN\|doi=10.1145/3140659.3080246\|doi-access=free \|last1=Jouppi \|first1=Norman P. \|last2=Young \|first2=Cliff \|last3=Patil \|first3=Nishant \|last4=Patterson \|first4=David \|last5=Agrawal \|first5=Gaurav \|last6=Bajwa \|first6=Raminder \|last7=Bates \|first7=Sarah \|last8=Bhatia \|first8=Suresh \|last9=Boden \|first9=Nan \|last10=Borchers \|first10=Al \|last11=Boyle \|first11=Rick \|last12=Cantin \|first12=Pierre-luc \|last13=Chao \|first13=Clifford \|last14=Clark \|first14=Chris \|last15=Coriell \|first15=Jeremy \|last16=Daley \|first16=Mike \|last17=Dau \|first17=Matt \|last18=Dean \|first18=Jeffrey \|last19=Gelb \|first19=Ben \|last20=Ghaemmaghami \|first20=Tara Vazir \|last21=Gottipati \|first21=Rajendra \|last22=Gulland \|first22=William \|last23=Hagmann \|first23=Robert \|last24=Ho \|first24=C. Richard \|last25=Hogberg \|first25=Doug \|last26=Hu \|first26=John \|last27=Hundt \|first27=Robert \|last28=Hurt \|first28=Dan \|last29=Ibarz \|first29=Julian \|last30=Jaffey \|first30=Aaron \|display-authors=1 \|arxiv=1704.04760 }}</ref> and [[Trainium]] and [[Inferentia]] chips in [[Amazon Web Services]].<ref>{{cite web \| title = How silicon innovation became the 'secret sauce' behind AWS's success\| website = Amazon Science\| date = July 27, 2022\| url = https://www.amazon.science/how-silicon-innovation-became-the-secret-sauce-behind-awss-success\| access-date = July 19, 2024}}</ref> Many vendor-specific terms exist for devices in this category, and it is an [[emerging technologies\|emerging technology]] without a [[dominant design]]. AI accelerators are used in mobile devices such as Apple [[iPhone]]s, AMD [[AI engine\|AI engines]]<ref>{{Cite book \|last=Brown \|first=Nick \|chapter=Exploring the Versal AI Engines for Accelerating Stencil-based Atmospheric Advection Simulation \|date=2023-02-12 \|title=Proceedings of the 2023 ACM/SIGDA International Symposium on Field Programmable Gate Arrays \|chapter-url=https://dl.acm.org/doi/10.1145/3543622.3573047 \|series=FPGA '23 \|___location=New York, NY, USA \|publisher=Association for Computing Machinery \|pages=91–97 \|doi=10.1145/3543622.3573047 \|isbn=978-1-4503-9417-8\|arxiv=2301.13016 }}</ref> in Versal and NPUs, [[Huawei]], and [[Google Pixel]] smartphones,<ref>{{Cite web\|url=https://consumer.huawei.com/en/press/news/2017/ifa2017-kirin970\|title=HUAWEI Reveals the Future of Mobile AI at IFA}}</ref> and seen in many [[Apple silicon]], [[Qualcomm]], [[Samsung]], and [[Google Tensor]] smartphone processors.<ref>{{Cite web\| title=Snapdragon 8 Gen 3 mobile platform \| url=https://docs.qualcomm.com/bundle/publicresource/87-71408-1_REV_B_Snapdragon_8_gen_3_Mobile_Platform_Product_Brief.pdf \| archive-url=https://web.archive.org/web/20231025162610/https://docs.qualcomm.com/bundle/publicresource/87-71408-1_REV_B_Snapdragon_8_gen_3_Mobile_Platform_Product_Brief.pdf \| archive-date=2023-10-25}}</ref> It is more recently (circa 2022) added to computer processors from [[Intel]],<ref>{{Cite web\|url=https://www.intel.com/content/www/us/en/newsroom/news/intels-lunar-lake-processors-arriving-q3-2024.html\|title=Intel's Lunar Lake Processors Arriving Q3 2024\|website=Intel\|date=May 20, 2024 }}</ref> [[AMD]],<ref>{{cite web\|title=AMD XDNA Architecture\|url=https://www.amd.com/en/technologies/xdna.html}}</ref> and Apple silicon.<ref>{{Cite web \|title=Deploying Transformers on the Apple Neural Engine \|url=https://machinelearning.apple.com/research/neural-engine-transformers \|access-date=2023-08-24 \|website=Apple Machine Learning Research \|language=en-US}}</ref> All models of Intel [[Meteor Lake]] processors have a built-in ''versatile processor unit'' (''VPU'') for accelerating [[statistical inference\|inference]] for computer vision and deep learning.<ref>{{Cite web\|url=https://www.pcmag.com/news/intel-to-bring-a-vpu-processor-unit-to-14th-gen-meteor-lake-chips\|title=Intel to Bring a 'VPU' Processor Unit to 14th Gen Meteor Lake Chips\|website=PCMAG\|date=August 2022 }}</ref> [[Graphics processing units]] designed by companies such as [[Nvidia]] and [[AMD]] often include AI-specific hardware, and are commonly used as AI accelerators, both for [[Machine learning\|training]] and [[Inference engine\|inference]].<ref>{{cite web\| last1 = Patel\| first1 = Dylan\| last2 = Nishball\| first2 = Daniel\| last3 = Xie\| first3 = Myron\| title = Nvidia's New China AI Chips Circumvent US Restrictions\| url=https://www.semianalysis.com/p/nvidias-new-china-ai-chips-circumvent\| website = SemiAnalysis\| date=2023-11-09\| access-date=2024-02-07}}</ref> On consumer devices, the NPU is intended to be small, power-efficient, but reasonably fast when used to run small models. To do this they are designed to support low-bitwidth operations using data types such as INT4, INT8, FP8, and FP16. A common metric is trillions of operations per second (TOPS), though this metric alone does not quantify which kind of operations are being done.<ref>{{cite web \|title=A guide to AI TOPS and NPU performance metrics \|url=https://www.qualcomm.com/news/onq/2024/04/a-guide-to-ai-tops-and-npu-performance-metrics}}</ref> Neural Processing Units (NPU) are another more native approach. Since 2017, several CPUs and SoCs have on-die NPUs: for example, [[Meteor Lake (microarchitecture)\|Intel Meteor Lake]], [[Lunar Lake]], and [[Apple A11]]. === Datacenters === Accelerators are used in [[cloud computing]] servers, including [[tensor processing unit]]s (TPU) in [[Google Cloud Platform]]<ref>{{Cite journal\|date=2017-06-24\|title=In-Datacenter Performance Analysis of a Tensor Processing Unit\|journal=ACM SIGARCH Computer Architecture News\|volume=45\|issue=2\|pages=1–12\|language=EN\|doi=10.1145/3140659.3080246\|doi-access=free \|last1=Jouppi \|first1=Norman P. \|last2=Young \|first2=Cliff \|last3=Patil \|first3=Nishant \|last4=Patterson \|first4=David \|last5=Agrawal \|first5=Gaurav \|last6=Bajwa \|first6=Raminder \|last7=Bates \|first7=Sarah \|last8=Bhatia \|first8=Suresh \|last9=Boden \|first9=Nan \|last10=Borchers \|first10=Al \|last11=Boyle \|first11=Rick \|last12=Cantin \|first12=Pierre-luc \|last13=Chao \|first13=Clifford \|last14=Clark \|first14=Chris \|last15=Coriell \|first15=Jeremy \|last16=Daley \|first16=Mike \|last17=Dau \|first17=Matt \|last18=Dean \|first18=Jeffrey \|last19=Gelb \|first19=Ben \|last20=Ghaemmaghami \|first20=Tara Vazir \|last21=Gottipati \|first21=Rajendra \|last22=Gulland \|first22=William \|last23=Hagmann \|first23=Robert \|last24=Ho \|first24=C. Richard \|last25=Hogberg \|first25=Doug \|last26=Hu \|first26=John \|last27=Hundt \|first27=Robert \|last28=Hurt \|first28=Dan \|last29=Ibarz \|first29=Julian \|last30=Jaffey \|first30=Aaron \|display-authors=1 \|arxiv=1704.04760 }}</ref> and [[Trainium]] and [[Inferentia]] chips in [[Amazon Web Services]].<ref>{{cite web \| title = How silicon innovation became the 'secret sauce' behind AWS's success\| website = Amazon Science\| date = July 27, 2022\| url = https://www.amazon.science/how-silicon-innovation-became-the-secret-sauce-behind-awss-success\| access-date = July 19, 2024}}</ref> Many vendor-specific terms exist for devices in this category, and it is an [[emerging technologies\|emerging technology]] without a [[dominant design]]. [[Graphics processing units]] designed by companies such as [[Nvidia]] and [[AMD]] often include AI-specific hardware, and are commonly used as AI accelerators, both for [[Machine learning\|training]] and [[Inference engine\|inference]].<ref>{{cite web\| last1 = Patel\| first1 = Dylan\| last2 = Nishball\| first2 = Daniel\| last3 = Xie\| first3 = Myron\| title = Nvidia's New China AI Chips Circumvent US Restrictions\| url=https://www.semianalysis.com/p/nvidias-new-china-ai-chips-circumvent\| website = SemiAnalysis\| date=2023-11-09\| access-date=2024-02-07}}</ref> All models of Intel [[Meteor Lake]] processors have a ''Versatile Processor Unit'' (''VPU'') built-in for accelerating [[statistical inference\|inference]] for computer vision and deep learning.<ref>{{Cite web\|url=https://www.pcmag.com/news/intel-to-bring-a-vpu-processor-unit-to-14th-gen-meteor-lake-chips\|title=Intel to Bring a 'VPU' Processor Unit to 14th Gen Meteor Lake Chips\|website=PCMAG\|date=August 2022 }}</ref> == ~~Benchmarks~~Programming == Mobile NPU vendors typically provide their own [[application programming interface]] such as the Snapdragon Neural Processing Engine. An operating system or a higher-level library may provide a more generic interface such as TensorFlow Lite with LiteRT Next (Android) or CoreML (iOS, macOS). Benchmarks such as MLPerf and others may be used to evaluate the performance of AI accelerators.<ref>{{cite web \| url=https://www.theregister.com/2022/09/09/nvidia_hopper_mlperf/ \| title=Nvidia claims 'record performance' for Hopper MLPerf debut }}</ref> Table 2 lists several typical benchmarks for AI accelerators. ~~{\| class="wikitable"~~ ~~\|+Table 2. Benchmarks.~~ ~~!Year~~ ~~!NN Benchmark~~ ~~!Affiliations~~ ~~!# of microbenchmarks~~ ~~!# of component benchmarks~~ ~~!# of application benchmarks~~ \|- ~~\|2012~~ ~~\|BenchNN~~ ~~\|ICT, CAS~~ ~~\|N/A~~ ~~\|12~~ ~~\|N/A~~ \|- ~~\|2016~~ ~~\|Fathom~~ ~~\|Harvard~~ ~~\|N/A~~ \|8 ~~\|N/A~~ \|- ~~\|2017~~ ~~\|BenchIP~~ ~~\|ICT, CAS~~ ~~\|12~~ ~~\|11~~ ~~\|N/A~~ \|- ~~\|2017~~ ~~\|DAWNBench~~ ~~\|Stanford~~ \|8 ~~\|N/A~~ ~~\|N/A~~ \|- ~~\|2017~~ ~~\|DeepBench~~ ~~\|Baidu~~ \|4 ~~\|N/A~~ ~~\|N/A~~ \|- ~~\|2018~~ ~~\|AI Benchmark~~ ~~\|ETH Zurich~~ ~~\|N/A~~ ~~\|26~~ ~~\|N/A~~ \|- ~~\|2018~~ ~~\|MLPerf~~ ~~\|Harvard, Intel, and Google, etc.~~ ~~\|N/A~~ \|7 ~~\|N/A~~ \|- ~~\|2019~~ ~~\|AIBench~~ ~~\|ICT, CAS and Alibaba, etc.~~ ~~\|12~~ ~~\|16~~ \|2 \|- ~~\|2019~~ ~~\|NNBench-X~~ ~~\|UCSB~~ ~~\|N/A~~ ~~\|10~~ ~~\|N/A~~ \|} Consumer CPU-integrated NPUs are accessible through vendor-specific APIs. AMD (Ryzen AI), Intel (OpenVINO), Apple silicon (CoreML){{efn\|MLX builds atop the CPU and GPU parts, not the Apple Neural Engine (ANE) part of Apple Silicon chips. The relatively good performance is due to the use of a large, fast [[unified memory]] design.}} each have their own APIs, which can be built upon by a higher-level library. ~~== Potential applications ==~~ [[Agricultural robot]]s, for example, herbicide-free weed control.<ref>{{cite web \|title=Development of a machine vision system for weed control using precision chemical application \|website=University of Florida \|citeseerx = 10.1.1.7.342 \|url=http://www.abe.ufl.edu/wlee/Publications/ICAME96.pdf \|archive-url=https://web.archive.org/web/20100623062608/http://www.abe.ufl.edu/wlee/Publications/ICAME96.pdf\|archive-date=June 23, 2010}}</ref> [[Vehicular automation\|Autonomous vehicles]]: Nvidia has targeted their [[Drive PX-series]] boards at this application.<ref>{{cite web \|url=https://www.nvidia.com/en-us/self-driving-cars/ \|title=Self-Driving Cars Technology & Solutions from NVIDIA Automotive \|website=NVIDIA}}</ref> [[Computer-aided diagnosis]] [[Industrial robot]]s, increasing the range of tasks that can be automated, by adding adaptability to variable situations. [[Machine translation]] [[Military robot]]s [[Natural language processing]] [[Search engine]]s, increasing the [[energy efficiency in computing\|energy efficiency]] of [[data center]]s and the ability to use increasingly advanced [[information retrieval\|queries]]. [[Unmanned aerial vehicle]]s, e.g. navigation systems, e.g. the [[Movidius Myriad 2]] has been demonstrated successfully guiding autonomous drones.<ref>{{cite web \|title=movidius powers worlds most intelligent drone \|url=https://www.siliconrepublic.com/machines/movidius-dji-drone \|date=March 16, 2016}}</ref> [[Voice user interface]], e.g. in mobile phones, a target for Qualcomm [[Zeroth (software)\|Zeroth]].<ref>{{cite web \|title=Qualcomm Research brings server class machine learning to everyday devices–making them smarter [VIDEO] \|url=https://www.qualcomm.com/news/onq/2015/10/01/qualcomm-research-brings-server-class-machine-learning-everyday-devices-making \|date=October 2015}}</ref> GPUs generally use existing [[GPGPU]] pipelines such as CUDA and OpenCL adapted for lower precisions. Custom-built systems such as the Google TPU use private interfaces. ~~== See also ==~~ [[Cognitive computer]] ==Notes== [[Neuromorphic engineering]] {{notelist}} [[Optical neural network]] [[Physical neural network]] [[UALink]] == References == {{Reflist~~\|32em~~}} == External links == [https://www.nextplatform.com/2016/04/05/nvidia-puts-accelerator-metal-pascal/ Nvidia Puts The Accelerator To The Metal With Pascal~~.htm~~], The Next Platform [http://eyeriss.mit.edu/ Eyeriss Project], MIT https://alphaics.ai/ {{Hardware acceleration}} [[Category:Application-specific integrated circuits]] [[Category:Neural processing units\| ]]