Neural processing unit: Difference between revisions

Content deleted Content added
not about NPUs
Citation bot (talk | contribs)
Alter: title, template type. Add: chapter-url, chapter. Removed or converted URL. Removed parameters. Some additions/deletions were parameter name changes. | Use this bot. Report bugs. | Suggested by Headbomb | #UCB_toolbar
 
(18 intermediate revisions by 14 users not shown)
Line 3:
{{Use mdy dates|date=October 2021}}
 
A '''neural processing unit''' ('''NPU'''), also known as '''AI accelerator''' or '''deep learning processor,''', is a class of specialized [[hardware acceleration|hardware accelerator]]<ref>{{cite web |url=https://www.v3.co.uk/v3-uk/news/3014293/intel-unveils-movidius-compute-stick-usb-ai-accelerator |title=Intel unveils Movidius Compute Stick USB AI Accelerator |date=July 21, 2017 |access-date=August 11, 2017 |archive-url=https://web.archive.org/web/20170811193632/https://www.v3.co.uk/v3-uk/news/3014293/intel-unveils-movidius-compute-stick-usb-ai-accelerator |archive-date=August 11, 2017 }}</ref> or computer system<ref>{{cite web |url=https://insidehpc.com/2017/06/inspurs-unveils-gx4-ai-accelerator/ |title=Inspurs unveils GX4 AI Accelerator |date=June 21, 2017}}</ref><ref>{{citation |title=Neural Magic raises $15 million to boost AI inferencing speed on off-the-shelf processors |last=Wiggers |first=Kyle |date=November 6, 2019 |url=https://venturebeat.com/2019/11/06/neural-magic-raises-15-million-to-boost-ai-training-speed-on-off-the-shelf-processors/ |publication-date=November 6, 2019 |orig-date=2019 |archive-url=https://web.archive.org/web/20200306120524/https://venturebeat.com/2019/11/06/neural-magic-raises-15-million-to-boost-ai-training-speed-on-off-the-shelf-processors/ |archive-date=March 6, 2020 |access-date=March 14, 2020}}</ref> designed to accelerate [[artificial intelligence]] (AI) and [[machine learning]] applications, including [[artificial neural network]]s and [[computer vision]].
 
==Use==
They canTheir bepurpose usedis either to efficiently execute already trained AI models (inference) or forto trainingtrain AI models. TypicalTheir applications include [[algorithm]]s for [[robotics]], [[Internet of Thingsthings]], and other [[data (computing)|data]]-intensive or sensor-driven tasks.<ref>{{cite web |url=https://www.eetimes.com/google-designing-ai-processors/ |title=Google Designing AI Processors|date=May 18, 2016 }} Google using its own AI accelerators.</ref> They are often [[Manycore processor|manycore]] or [[Spatial architecture|spatial]] designs and generally focus on [[precision (computer science)|low-precision]] arithmetic, novel [[dataflow architecture]]s, or [[in-memory computing]] capability. {{As of|2024}}, a typical datacenter-grade AI [[integrated circuit]] chip, the H100 GPU, [[transistor count|contains tens of billions]] of [[MOSFET]]s.<ref>{{cite web|url=https://www.datacenterdynamics.com/en/news/nvidia-reveals-new-hopper-h100-gpu-with-80-billion-transistors/|title=Nvidia reveals new Hopper H100 GPU, with 80 billion transistors|last=Moss|first=Sebastian|date=2022-03-23|website=Data Center Dynamics|access-date=2024-01-30}}</ref>
 
=== Consumer devices ===
AI accelerators are used in mobile devices such as Apple [[iPhone]]s and [[Huawei]] cellphones,<ref>{{Cite web|url=https://consumer.huawei.com/en/press/news/2017/ifa2017-kirin970|title=HUAWEI Reveals the Future of Mobile AI at IFA}}</ref> and personal computers such as [[Intel]] laptops,<ref>{{Cite web|url=https://www.intel.com/content/www/us/en/newsroom/news/intels-lunar-lake-processors-arriving-q3-2024.html|title=Intel's Lunar Lake Processors Arriving Q3 2024|website=Intel|date=May 20, 2024 }}</ref> [[AMD]] laptops<ref>{{cite web|title=AMD XDNA Architecture|url=https://www.amd.com/en/technologies/xdna.html}}</ref> and [[Apple silicon]] [[Mac (computer)|Macs]].<ref>{{Cite web |title=Deploying Transformers on the Apple Neural Engine |url=https://machinelearning.apple.com/research/neural-engine-transformers |access-date=2023-08-24 |website=Apple Machine Learning Research |language=en-US}}</ref> Accelerators are used in [[cloud computing]] servers, including [[tensor processing unit]]s (TPU) in [[Google Cloud Platform]]<ref>{{Cite journal|date=2017-06-24|title=In-Datacenter Performance Analysis of a Tensor Processing Unit|journal=ACM SIGARCH Computer Architecture News|volume=45|issue=2|pages=1–12|language=EN|doi=10.1145/3140659.3080246|doi-access=free |last1=Jouppi |first1=Norman P. |last2=Young |first2=Cliff |last3=Patil |first3=Nishant |last4=Patterson |first4=David |last5=Agrawal |first5=Gaurav |last6=Bajwa |first6=Raminder |last7=Bates |first7=Sarah |last8=Bhatia |first8=Suresh |last9=Boden |first9=Nan |last10=Borchers |first10=Al |last11=Boyle |first11=Rick |last12=Cantin |first12=Pierre-luc |last13=Chao |first13=Clifford |last14=Clark |first14=Chris |last15=Coriell |first15=Jeremy |last16=Daley |first16=Mike |last17=Dau |first17=Matt |last18=Dean |first18=Jeffrey |last19=Gelb |first19=Ben |last20=Ghaemmaghami |first20=Tara Vazir |last21=Gottipati |first21=Rajendra |last22=Gulland |first22=William |last23=Hagmann |first23=Robert |last24=Ho |first24=C. Richard |last25=Hogberg |first25=Doug |last26=Hu |first26=John |last27=Hundt |first27=Robert |last28=Hurt |first28=Dan |last29=Ibarz |first29=Julian |last30=Jaffey |first30=Aaron |display-authors=1 |arxiv=1704.04760 }}</ref> and [[Trainium]] and [[Inferentia]] chips in [[Amazon Web Services]].<ref>{{cite web | title = How silicon innovation became the 'secret sauce' behind AWS's success| website = Amazon Science| date = July 27, 2022| url = https://www.amazon.science/how-silicon-innovation-became-the-secret-sauce-behind-awss-success| access-date = July 19, 2024}}</ref> Many vendor-specific terms exist for devices in this category, and it is an [[emerging technologies|emerging technology]] without a [[dominant design]].
AI accelerators are used in mobile devices such as Apple [[iPhone]]s, AMD [[AI engine|AI engines]]<ref>{{Cite book |last=Brown |first=Nick |chapter=Exploring the Versal AI Engines for Accelerating Stencil-based Atmospheric Advection Simulation |date=2023-02-12 |title=Proceedings of the 2023 ACM/SIGDA International Symposium on Field Programmable Gate Arrays |chapter-url=https://dl.acm.org/doi/10.1145/3543622.3573047 |series=FPGA '23 |___location=New York, NY, USA |publisher=Association for Computing Machinery |pages=91–97 |doi=10.1145/3543622.3573047 |isbn=978-1-4503-9417-8|arxiv=2301.13016 }}</ref> in Versal and NPUs, [[Huawei]], and [[Google Pixel]] smartphones,<ref>{{Cite web|url=https://consumer.huawei.com/en/press/news/2017/ifa2017-kirin970|title=HUAWEI Reveals the Future of Mobile AI at IFA}}</ref> and seen in many [[Apple silicon]], [[Qualcomm]], [[Samsung]], and [[Google Tensor]] smartphone processors.<ref>{{Cite web| title=Snapdragon 8 Gen 3 mobile platform | url=https://docs.qualcomm.com/bundle/publicresource/87-71408-1_REV_B_Snapdragon_8_gen_3_Mobile_Platform_Product_Brief.pdf | archive-url=https://web.archive.org/web/20231025162610/https://docs.qualcomm.com/bundle/publicresource/87-71408-1_REV_B_Snapdragon_8_gen_3_Mobile_Platform_Product_Brief.pdf | archive-date=2023-10-25}}</ref>
 
It is more recently (circa 2022) added to computer processors from [[Intel]],<ref>{{Cite web|url=https://www.intel.com/content/www/us/en/newsroom/news/intels-lunar-lake-processors-arriving-q3-2024.html|title=Intel's Lunar Lake Processors Arriving Q3 2024|website=Intel|date=May 20, 2024 }}</ref> [[AMD]],<ref>{{cite web|title=AMD XDNA Architecture|url=https://www.amd.com/en/technologies/xdna.html}}</ref> and Apple silicon.<ref>{{Cite web |title=Deploying Transformers on the Apple Neural Engine |url=https://machinelearning.apple.com/research/neural-engine-transformers |access-date=2023-08-24 |website=Apple Machine Learning Research |language=en-US}}</ref> All models of Intel [[Meteor Lake]] processors have a built-in ''versatile processor unit'' (''VPU'') for accelerating [[statistical inference|inference]] for computer vision and deep learning.<ref>{{Cite web|url=https://www.pcmag.com/news/intel-to-bring-a-vpu-processor-unit-to-14th-gen-meteor-lake-chips|title=Intel to Bring a 'VPU' Processor Unit to 14th Gen Meteor Lake Chips|website=PCMAG|date=August 2022 }}</ref>
 
On consumer devices, the NPU is intended to be small, power-efficient, but reasonably fast when used to run small models. To do this they are designed to support low-bitwidth operations using data types such as INT4, INT8, FP8, and FP16. A common metric is trillions of operations per second (TOPS), though this metric alone does not quantify which kind of operations are being done.<ref>{{cite web |title=A guide to AI TOPS and NPU performance metrics |url=https://www.qualcomm.com/news/onq/2024/04/a-guide-to-ai-tops-and-npu-performance-metrics}}</ref>
 
=== Datacenters ===
AI accelerators are used in mobile devices such as Apple [[iPhone]]s and [[Huawei]] cellphones,<ref>{{Cite web|url=https://consumer.huawei.com/en/press/news/2017/ifa2017-kirin970|title=HUAWEI Reveals the Future of Mobile AI at IFA}}</ref> and personal computers such as [[Intel]] laptops,<ref>{{Cite web|url=https://www.intel.com/content/www/us/en/newsroom/news/intels-lunar-lake-processors-arriving-q3-2024.html|title=Intel's Lunar Lake Processors Arriving Q3 2024|website=Intel|date=May 20, 2024 }}</ref> [[AMD]] laptops<ref>{{cite web|title=AMD XDNA Architecture|url=https://www.amd.com/en/technologies/xdna.html}}</ref> and [[Apple silicon]] [[Mac (computer)|Macs]].<ref>{{Cite web |title=Deploying Transformers on the Apple Neural Engine |url=https://machinelearning.apple.com/research/neural-engine-transformers |access-date=2023-08-24 |website=Apple Machine Learning Research |language=en-US}}</ref> Accelerators are used in [[cloud computing]] servers, including [[tensor processing unit]]s (TPU) in [[Google Cloud Platform]]<ref>{{Cite journal|date=2017-06-24|title=In-Datacenter Performance Analysis of a Tensor Processing Unit|journal=ACM SIGARCH Computer Architecture News|volume=45|issue=2|pages=1–12|language=EN|doi=10.1145/3140659.3080246|doi-access=free |last1=Jouppi |first1=Norman P. |last2=Young |first2=Cliff |last3=Patil |first3=Nishant |last4=Patterson |first4=David |last5=Agrawal |first5=Gaurav |last6=Bajwa |first6=Raminder |last7=Bates |first7=Sarah |last8=Bhatia |first8=Suresh |last9=Boden |first9=Nan |last10=Borchers |first10=Al |last11=Boyle |first11=Rick |last12=Cantin |first12=Pierre-luc |last13=Chao |first13=Clifford |last14=Clark |first14=Chris |last15=Coriell |first15=Jeremy |last16=Daley |first16=Mike |last17=Dau |first17=Matt |last18=Dean |first18=Jeffrey |last19=Gelb |first19=Ben |last20=Ghaemmaghami |first20=Tara Vazir |last21=Gottipati |first21=Rajendra |last22=Gulland |first22=William |last23=Hagmann |first23=Robert |last24=Ho |first24=C. Richard |last25=Hogberg |first25=Doug |last26=Hu |first26=John |last27=Hundt |first27=Robert |last28=Hurt |first28=Dan |last29=Ibarz |first29=Julian |last30=Jaffey |first30=Aaron |display-authors=1 |arxiv=1704.04760 }}</ref> and [[Trainium]] and [[Inferentia]] chips in [[Amazon Web Services]].<ref>{{cite web | title = How silicon innovation became the 'secret sauce' behind AWS's success| website = Amazon Science| date = July 27, 2022| url = https://www.amazon.science/how-silicon-innovation-became-the-secret-sauce-behind-awss-success| access-date = July 19, 2024}}</ref> Many vendor-specific terms exist for devices in this category, and it is an [[emerging technologies|emerging technology]] without a [[dominant design]].
 
[[Graphics processing units]] designed by companies such as [[Nvidia]] and [[AMD]] often include AI-specific hardware, and are commonly used as AI accelerators, both for [[Machine learning|training]] and [[Inference engine|inference]].<ref>{{cite web| last1 = Patel| first1 = Dylan| last2 = Nishball| first2 = Daniel| last3 = Xie| first3 = Myron| title = Nvidia's New China AI Chips Circumvent US Restrictions| url=https://www.semianalysis.com/p/nvidias-new-china-ai-chips-circumvent| website = SemiAnalysis| date=2023-11-09| access-date=2024-02-07}}</ref>
 
== Programming ==
Neural Processing Units (NPU) are another more native approach. Since 2017, several CPUs and SoCs have on-die NPUs: for example, [[Meteor Lake (microarchitecture)|Intel Meteor Lake]], [[Lunar Lake]], and [[Apple A11]].
Mobile NPU vendors typically provide their own [[application programming interface]] such as the Snapdragon Neural Processing Engine. An operating system or a higher-level library may provide a more generic interface such as TensorFlow Lite with LiteRT Next (Android) or CoreML (iOS, macOS).
 
Consumer CPU-integrated NPUs are accessible through vendor-specific APIs. AMD (Ryzen AI), Intel (OpenVINO), Apple silicon (CoreML){{efn|MLX builds atop the CPU and GPU parts, not the Apple Neural Engine (ANE) part of Apple Silicon chips. The relatively good performance is due to the use of a large, fast [[unified memory]] design.}} each have their own APIs, which can be built upon by a higher-level library.
All models of Intel [[Meteor Lake]] processors have a ''Versatile Processor Unit'' (''VPU'') built-in for accelerating [[statistical inference|inference]] for computer vision and deep learning.<ref>{{Cite web|url=https://www.pcmag.com/news/intel-to-bring-a-vpu-processor-unit-to-14th-gen-meteor-lake-chips|title=Intel to Bring a 'VPU' Processor Unit to 14th Gen Meteor Lake Chips|website=PCMAG|date=August 2022 }}</ref>
 
GPUs generally use existing [[GPGPU]] pipelines such as CUDA and OpenCL adapted for lower precisions. Custom-built systems such as the Google TPU use private interfaces.
== Potential applications ==
*[[Agricultural robot]]s, for example, herbicide-free weed control.<ref>{{cite web |title=Development of a machine vision system for weed control using precision chemical application |website=University of Florida |citeseerx = 10.1.1.7.342 |url=http://www.abe.ufl.edu/wlee/Publications/ICAME96.pdf |archive-url=https://web.archive.org/web/20100623062608/http://www.abe.ufl.edu/wlee/Publications/ICAME96.pdf|archive-date=June 23, 2010}}</ref>
*[[Vehicular automation|Autonomous vehicles]]: Nvidia has targeted their [[Drive PX-series]] boards at this application.<ref>{{cite web |url=https://www.nvidia.com/en-us/self-driving-cars/ |title=Self-Driving Cars Technology & Solutions from NVIDIA Automotive |website=NVIDIA}}</ref>
*[[Computer-aided diagnosis]]
*[[Industrial robot]]s, increasing the range of tasks that can be automated, by adding adaptability to variable situations.
*[[Machine translation]]
*[[Military robot]]s
*[[Natural language processing]]
*[[Search engine]]s, increasing the [[energy efficiency in computing|energy efficiency]] of [[data center]]s and the ability to use increasingly advanced [[information retrieval|queries]].
*[[Unmanned aerial vehicle]]s, e.g. navigation systems, e.g. the [[Movidius Myriad 2]] has been demonstrated successfully guiding autonomous drones.<ref>{{cite web |title=movidius powers worlds most intelligent drone |url=https://www.siliconrepublic.com/machines/movidius-dji-drone |date=March 16, 2016}}</ref>
*[[Voice user interface]], e.g. in mobile phones, a target for Qualcomm [[Zeroth (software)|Zeroth]].<ref>{{cite web |title=Qualcomm Research brings server class machine learning to everyday devices–making them smarter [VIDEO] |url=https://www.qualcomm.com/news/onq/2015/10/01/qualcomm-research-brings-server-class-machine-learning-everyday-devices-making |date=October 2015}}</ref>
 
== See also Notes==
{{notelist}}
*[[Cognitive computer]]
*[[Neuromorphic engineering]]
*[[Optical neural network]]
*[[Physical neural network]]
*[[UALink]]
 
== References ==
{{Reflist|32em}}
 
== External links ==
*[https://www.nextplatform.com/2016/04/05/nvidia-puts-accelerator-metal-pascal/ Nvidia Puts The Accelerator To The Metal With Pascal.htm], The Next Platform
*[http://eyeriss.mit.edu/ Eyeriss Project], MIT
*https://alphaics.ai/
 
{{Hardware acceleration}}
 
[[Category:Application-specific integrated circuits]]
[[Category:Neural processing units| ]]