Neural processing unit: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 07:18, 9 July 2025 edit WikiCleanerBot (talk \| contribs) Bots 1,007,735 edits m v2.05b - Bot T20 CW#61 - Fix errors for CW project (Reference before punctuation) Tag: WPCleaner ← Previous edit		Latest revision as of 20:06, 8 August 2025 edit undo Citation bot (talk \| contribs) Bots 5,869,686 edits Alter: title, template type. Add: chapter-url, chapter. Removed or converted URL. Removed parameters. Some additions/deletions were parameter name changes. \| Use this bot. Report bugs. \| Suggested by Headbomb \| #UCB_toolbar
(11 intermediate revisions by 8 users not shown)
Line 2: {{Use American English\|date=January 2019}} {{Use mdy dates\|date=October 2021}} A '''neural processing unit''' ('''NPU'''), also known as '''AI accelerator''' or '''deep learning processor,''', is a class of specialized [[hardware acceleration\|hardware accelerator]]<ref>{{cite web \|url=https://www.v3.co.uk/v3-uk/news/3014293/intel-unveils-movidius-compute-stick-usb-ai-accelerator \|title=Intel unveils Movidius Compute Stick USB AI Accelerator \|date=July 21, 2017 \|access-date=August 11, 2017 \|archive-url=https://web.archive.org/web/20170811193632/https://www.v3.co.uk/v3-uk/news/3014293/intel-unveils-movidius-compute-stick-usb-ai-accelerator \|archive-date=August 11, 2017 }}</ref> or computer system<ref>{{cite web \|url=https://insidehpc.com/2017/06/inspurs-unveils-gx4-ai-accelerator/ \|title=Inspurs unveils GX4 AI Accelerator \|date=June 21, 2017}}</ref><ref>{{citation \|title=Neural Magic raises $15 million to boost AI inferencing speed on off-the-shelf processors \|last=Wiggers \|first=Kyle \|date=November 6, 2019 \|url=https://venturebeat.com/2019/11/06/neural-magic-raises-15-million-to-boost-ai-training-speed-on-off-the-shelf-processors/ \|publication-date=November 6, 2019 \|orig-date=2019 \|archive-url=https://web.archive.org/web/20200306120524/https://venturebeat.com/2019/11/06/neural-magic-raises-15-million-to-boost-ai-training-speed-on-off-the-shelf-processors/ \|archive-date=March 6, 2020 \|access-date=March 14, 2020}}</ref> designed to accelerate [[artificial intelligence]] (AI) and [[machine learning]] applications, including [[artificial neural network]]s and [[computer vision]]. ==Use== Their purpose is either to efficiently execute already trained AI models (inference) or to train AI models. Their applications include [[algorithm]]s for [[robotics]], [[Internet of things]], and [[data (computing)\|data]]-intensive or sensor-driven tasks.<ref>{{cite web \|url=https://www.eetimes.com/google-designing-ai-processors/ \|title=Google Designing AI Processors\|date=May 18, 2016 }} Google using its own AI accelerators.</ref> They are often [[Manycore processor\|manycore]] or [[Spatial architecture\|spatial]] designs and focus on [[precision (computer science)\|low-precision]] arithmetic, novel [[dataflow architecture]]s, or [[in-memory computing]] capability. {{As of\|2024}}, a typical datacenter-grade AI [[integrated circuit]] chip, the H100 GPU, [[transistor count\|contains tens of billions]] of [[MOSFET]]s.<ref>{{cite web\|url=https://www.datacenterdynamics.com/en/news/nvidia-reveals-new-hopper-h100-gpu-with-80-billion-transistors/\|title=Nvidia reveals new Hopper H100 GPU, with 80 billion transistors\|last=Moss\|first=Sebastian\|date=2022-03-23\|website=Data Center Dynamics\|access-date=2024-01-30}}</ref> === Consumer devices === AI accelerators are used in mobile devices such as Apple [[iPhone]]s, AMD [[AI engine\|AI engines]]<ref>{{Cite book \|last=Brown \|first=Nick \|chapter=Exploring the Versal AI Engines for Accelerating Stencil-based Atmospheric Advection Simulation \|date=2023-02-12 \|title=Proceedings of the 2023 ACM/SIGDA International Symposium on Field Programmable Gate Arrays \|chapter-url=https://dl.acm.org/doi/10.1145/3543622.3573047 \|series=FPGA '23 \|___location=New York, NY, USA \|publisher=Association for Computing Machinery \|pages=91–97 \|doi=10.1145/3543622.3573047 \|isbn=978-1-4503-9417-8\|arxiv=2301.13016 }}</ref> in Versal and NPUs, [[Huawei]], and [[Google Pixel]] smartphones,<ref>{{Cite web\|url=https://consumer.huawei.com/en/press/news/2017/ifa2017-kirin970\|title=HUAWEI Reveals the Future of Mobile AI at IFA}}</ref> and seen in many [[Apple silicon]], [[Qualcomm]], [[Samsung]], and [[Google Tensor]] smartphone processors.<ref>{{Cite web\| title=Snapdragon 8 Gen 3 mobile platform \| url=https://docs.qualcomm.com/bundle/publicresource/87-71408-1_REV_B_Snapdragon_8_gen_3_Mobile_Platform_Product_Brief.pdf \| archive-url=https://web.archive.org/web/20231025162610/https://docs.qualcomm.com/bundle/publicresource/87-71408-1_REV_B_Snapdragon_8_gen_3_Mobile_Platform_Product_Brief.pdf \| archive-date=2023-10-25}}</ref> ~~[[Graphics~~It ~~processing~~is ~~units]]~~more ~~designed~~recently by(circa ~~companies~~2022) ~~such~~added asto ~~[[Nvidia]]~~computer ~~and~~processors from [[~~AMD~~Intel]],<ref>{{Cite ~~often include AI~~web\|url=https://www.intel.com/content/www/us/en/newsroom/news/intels-~~specific~~lunar-lake-processors-arriving-q3-2024.html\|title=Intel's ~~hardware,~~Lunar ~~and~~Lake ~~are~~Processors ~~commonly~~Arriving ~~used~~Q3 ~~as AI~~2024\|website=Intel\|date=May ~~accelerators~~20, ~~both~~2024 ~~for~~}}</ref> [[~~Machine learning\|training]] and [[Inference engine\|inference~~AMD]].,<ref>{{cite web\| ~~last1~~ title=AMD ~~Patel\| first1 =~~XDNA ~~Dylan~~Architecture\| ~~last2~~ url=https://www.amd.com/en/technologies/xdna.html}}</ref> ~~Nishball\|~~and ~~first2~~Apple =silicon.<ref>{{Cite ~~Daniel\| last3~~web ~~= Xie~~\| ~~first3 = Myron\|~~ title =Deploying ~~Nvidia's~~Transformers ~~New~~on ~~China~~the AIApple ~~Chips~~Neural ~~Circumvent~~Engine ~~US Restrictions~~\| url=https://~~www~~machinelearning.~~semianalysis~~apple.com/presearch/~~nvidias~~neural-~~new~~engine-~~china-ai-chips-circumvent\|~~transformers ~~website = SemiAnalysis~~\| access-date=2023-1108-0924 \|website=Apple ~~access-date~~Machine Learning Research \|language=~~2024~~en-~~02-07~~US}}</ref> All models of Intel [[Meteor Lake]] processors have a built-in ''versatile processor unit'' (''VPU'') for accelerating [[statistical inference\|inference]] for computer vision and deep learning.<ref>{{Cite web\|url=https://www.pcmag.com/news/intel-to-bring-a-vpu-processor-unit-to-14th-gen-meteor-lake-chips\|title=Intel to Bring a 'VPU' Processor Unit to 14th Gen Meteor Lake Chips\|website=PCMAG\|date=August 2022 }}</ref>▼ On consumer devices, the NPU is intended to be small, power-efficient, but reasonably fast when used to run small models. To do this they are designed to support low-bitwidth operations using data types such as INT4, INT8, FP8, and FP16. A common metric is trillions of operations per second (TOPS), though this metric alone does not quantify which kind of operations are being done.<ref>{{cite web \|title=A guide to AI TOPS and NPU performance metrics \|url=https://www.qualcomm.com/news/onq/2024/04/a-guide-to-ai-tops-and-npu-performance-metrics}}</ref> === Datacenters === AI accelerators are used in mobile devices such as Apple [[iPhone]]s and [[Huawei]] smartphones,<ref>{{Cite web\|url=https://consumer.huawei.com/en/press/news/2017/ifa2017-kirin970\|title=HUAWEI Reveals the Future of Mobile AI at IFA}}</ref> and seen in many [[Qualcomm]] and [[Samsung]] smartphone processors,<ref>https://docs.qualcomm.com/bundle/publicresource/87-71408-1_REV_B_Snapdragon_8_gen_3_Mobile_Platform_Product_Brief.pdf</ref> some [[Intel]] <ref>{{Cite web\|url=https://www.intel.com/content/www/us/en/newsroom/news/intels-lunar-lake-processors-arriving-q3-2024.html\|title=Intel's Lunar Lake Processors Arriving Q3 2024\|website=Intel\|date=May 20, 2024 }}</ref> and [[AMD]] computer processors,<ref>{{cite web\|title=AMD XDNA Architecture\|url=https://www.amd.com/en/technologies/xdna.html}}</ref> and [[Apple silicon]] [[Mac (computer)\|Macs]].<ref>{{Cite web \|title=Deploying Transformers on the Apple Neural Engine \|url=https://machinelearning.apple.com/research/neural-engine-transformers \|access-date=2023-08-24 \|website=Apple Machine Learning Research \|language=en-US}}</ref> Accelerators are used in [[cloud computing]] servers, including [[tensor processing unit]]s (TPU) in [[Google Cloud Platform]]<ref>{{Cite journal\|date=2017-06-24\|title=In-Datacenter Performance Analysis of a Tensor Processing Unit\|journal=ACM SIGARCH Computer Architecture News\|volume=45\|issue=2\|pages=1–12\|language=EN\|doi=10.1145/3140659.3080246\|doi-access=free \|last1=Jouppi \|first1=Norman P. \|last2=Young \|first2=Cliff \|last3=Patil \|first3=Nishant \|last4=Patterson \|first4=David \|last5=Agrawal \|first5=Gaurav \|last6=Bajwa \|first6=Raminder \|last7=Bates \|first7=Sarah \|last8=Bhatia \|first8=Suresh \|last9=Boden \|first9=Nan \|last10=Borchers \|first10=Al \|last11=Boyle \|first11=Rick \|last12=Cantin \|first12=Pierre-luc \|last13=Chao \|first13=Clifford \|last14=Clark \|first14=Chris \|last15=Coriell \|first15=Jeremy \|last16=Daley \|first16=Mike \|last17=Dau \|first17=Matt \|last18=Dean \|first18=Jeffrey \|last19=Gelb \|first19=Ben \|last20=Ghaemmaghami \|first20=Tara Vazir \|last21=Gottipati \|first21=Rajendra \|last22=Gulland \|first22=William \|last23=Hagmann \|first23=Robert \|last24=Ho \|first24=C. Richard \|last25=Hogberg \|first25=Doug \|last26=Hu \|first26=John \|last27=Hundt \|first27=Robert \|last28=Hurt \|first28=Dan \|last29=Ibarz \|first29=Julian \|last30=Jaffey \|first30=Aaron \|display-authors=1 \|arxiv=1704.04760 }}</ref> and [[Trainium]] and [[Inferentia]] chips in [[Amazon Web Services]].<ref>{{cite web \| title = How silicon innovation became the 'secret sauce' behind AWS's success\| website = Amazon Science\| date = July 27, 2022\| url = https://www.amazon.science/how-silicon-innovation-became-the-secret-sauce-behind-awss-success\| access-date = July 19, 2024}}</ref> Many vendor-specific terms exist for devices in this category, and it is an [[emerging technologies\|emerging technology]] without a [[dominant design]].▼ [[Graphics processing units]] designed by companies such as [[Nvidia]] and [[AMD]] often include AI-specific hardware, and are commonly used as AI accelerators, both for [[Machine learning\|training]] and [[Inference engine\|inference]].<ref>{{cite web\| last1 = Patel\| first1 = Dylan\| last2 = Nishball\| first2 = Daniel\| last3 = Xie\| first3 = Myron\| title = Nvidia's New China AI Chips Circumvent US Restrictions\| url=https://www.semianalysis.com/p/nvidias-new-china-ai-chips-circumvent\| website = SemiAnalysis\| date=2023-11-09\| access-date=2024-02-07}}</ref> == Programming == Mobile NPU vendors typically provide their own [[application programming interface]] such as the Snapdragon Neural Processing Engine. An operating system or a higher-level library may provide a more generic interface such as TensorFlow Lite with LiteRT Next (Android) or CoreML (iOS, macOS). Consumer CPU-integrated NPUs are accessible through vendor-specific APIs. AMD (Ryzen AI), Intel (OpenVINO), Apple silicon (CoreML){{efn\|MLX builds atop the CPU and GPU parts, not the Apple Neural Engine (ANE) part of Apple Silicon chips. The relatively good performance is due to the use of a large, fast [[unified memory]] design.}} each have their own APIs, which can be built upon by a higher-level library. GPUs generally use existing [[GPGPU]] pipelines such as CUDA and OpenCL adapted for lower precisions. Custom-built systems such as the Google TPU use private interfaces. ▲AI accelerators are used in mobile devices such as Apple [[iPhone]]s and [[Huawei]] smartphones,<ref>{{Cite web\|url=https://consumer.huawei.com/en/press/news/2017/ifa2017-kirin970\|title=HUAWEI Reveals the Future of Mobile AI at IFA}}</ref> and seen in many [[Qualcomm]] and [[Samsung]] smartphone processors,<ref>https://docs.qualcomm.com/bundle/publicresource/87-71408-1_REV_B_Snapdragon_8_gen_3_Mobile_Platform_Product_Brief.pdf</ref> some [[Intel]] <ref>{{Cite web\|url=https://www.intel.com/content/www/us/en/newsroom/news/intels-lunar-lake-processors-arriving-q3-2024.html\|title=Intel's Lunar Lake Processors Arriving Q3 2024\|website=Intel\|date=May 20, 2024 }}</ref> and [[AMD]] computer processors,<ref>{{cite web\|title=AMD XDNA Architecture\|url=https://www.amd.com/en/technologies/xdna.html}}</ref> and [[Apple silicon]] [[Mac (computer)\|Macs]].<ref>{{Cite web \|title=Deploying Transformers on the Apple Neural Engine \|url=https://machinelearning.apple.com/research/neural-engine-transformers \|access-date=2023-08-24 \|website=Apple Machine Learning Research \|language=en-US}}</ref> Accelerators are used in [[cloud computing]] servers, including [[tensor processing unit]]s (TPU) in [[Google Cloud Platform]]<ref>{{Cite journal\|date=2017-06-24\|title=In-Datacenter Performance Analysis of a Tensor Processing Unit\|journal=ACM SIGARCH Computer Architecture News\|volume=45\|issue=2\|pages=1–12\|language=EN\|doi=10.1145/3140659.3080246\|doi-access=free \|last1=Jouppi \|first1=Norman P. \|last2=Young \|first2=Cliff \|last3=Patil \|first3=Nishant \|last4=Patterson \|first4=David \|last5=Agrawal \|first5=Gaurav \|last6=Bajwa \|first6=Raminder \|last7=Bates \|first7=Sarah \|last8=Bhatia \|first8=Suresh \|last9=Boden \|first9=Nan \|last10=Borchers \|first10=Al \|last11=Boyle \|first11=Rick \|last12=Cantin \|first12=Pierre-luc \|last13=Chao \|first13=Clifford \|last14=Clark \|first14=Chris \|last15=Coriell \|first15=Jeremy \|last16=Daley \|first16=Mike \|last17=Dau \|first17=Matt \|last18=Dean \|first18=Jeffrey \|last19=Gelb \|first19=Ben \|last20=Ghaemmaghami \|first20=Tara Vazir \|last21=Gottipati \|first21=Rajendra \|last22=Gulland \|first22=William \|last23=Hagmann \|first23=Robert \|last24=Ho \|first24=C. Richard \|last25=Hogberg \|first25=Doug \|last26=Hu \|first26=John \|last27=Hundt \|first27=Robert \|last28=Hurt \|first28=Dan \|last29=Ibarz \|first29=Julian \|last30=Jaffey \|first30=Aaron \|display-authors=1 \|arxiv=1704.04760 }}</ref> and [[Trainium]] and [[Inferentia]] chips in [[Amazon Web Services]].<ref>{{cite web \| title = How silicon innovation became the 'secret sauce' behind AWS's success\| website = Amazon Science\| date = July 27, 2022\| url = https://www.amazon.science/how-silicon-innovation-became-the-secret-sauce-behind-awss-success\| access-date = July 19, 2024}}</ref> Many vendor-specific terms exist for devices in this category, and it is an [[emerging technologies\|emerging technology]] without a [[dominant design]]. ==Notes== ▲[[Graphics processing units]] designed by companies such as [[Nvidia]] and [[AMD]] often include AI-specific hardware, and are commonly used as AI accelerators, both for [[Machine learning\|training]] and [[Inference engine\|inference]].<ref>{{cite web\| last1 = Patel\| first1 = Dylan\| last2 = Nishball\| first2 = Daniel\| last3 = Xie\| first3 = Myron\| title = Nvidia's New China AI Chips Circumvent US Restrictions\| url=https://www.semianalysis.com/p/nvidias-new-china-ai-chips-circumvent\| website = SemiAnalysis\| date=2023-11-09\| access-date=2024-02-07}}</ref> All models of Intel [[Meteor Lake]] processors have a built-in ''versatile processor unit'' (''VPU'') for accelerating [[statistical inference\|inference]] for computer vision and deep learning.<ref>{{Cite web\|url=https://www.pcmag.com/news/intel-to-bring-a-vpu-processor-unit-to-14th-gen-meteor-lake-chips\|title=Intel to Bring a 'VPU' Processor Unit to 14th Gen Meteor Lake Chips\|website=PCMAG\|date=August 2022 }}</ref> {{notelist}} == References == {{Reflist~~\|32em~~}} == External links ==