Content deleted Content added
m v2.05b - Bot T20 CW#61 - Fix errors for CW project (Reference before punctuation) |
Citation bot (talk | contribs) Alter: title, template type. Add: chapter-url, chapter. Removed or converted URL. Removed parameters. Some additions/deletions were parameter name changes. | Use this bot. Report bugs. | Suggested by Headbomb | #UCB_toolbar |
||
(11 intermediate revisions by 8 users not shown) | |||
Line 2:
{{Use American English|date=January 2019}}
{{Use mdy dates|date=October 2021}}
A '''neural processing unit''' ('''NPU'''), also known as '''AI accelerator''' or '''deep learning processor
==Use==
Their purpose is either to efficiently execute already trained AI models (inference) or to train AI models. Their applications include [[algorithm]]s for [[robotics]], [[Internet of things]], and [[data (computing)|data]]-intensive or sensor-driven tasks.<ref>{{cite web |url=https://www.eetimes.com/google-designing-ai-processors/ |title=Google Designing AI Processors|date=May 18, 2016 }} Google using its own AI accelerators.</ref> They are often [[Manycore processor|manycore]] or [[Spatial architecture|spatial]] designs and focus on [[precision (computer science)|low-precision]] arithmetic, novel [[dataflow architecture]]s, or [[in-memory computing]] capability. {{As of|2024}}, a typical datacenter-grade AI [[integrated circuit]] chip, the H100 GPU, [[transistor count|contains tens of billions]] of [[MOSFET]]s.<ref>{{cite web|url=https://www.datacenterdynamics.com/en/news/nvidia-reveals-new-hopper-h100-gpu-with-80-billion-transistors/|title=Nvidia reveals new Hopper H100 GPU, with 80 billion transistors|last=Moss|first=Sebastian|date=2022-03-23|website=Data Center Dynamics|access-date=2024-01-30}}</ref>
=== Consumer devices ===
AI accelerators are used in mobile devices such as Apple [[iPhone]]s, AMD [[AI engine|AI engines]]<ref>{{Cite book |last=Brown |first=Nick |chapter=Exploring the Versal AI Engines for Accelerating Stencil-based Atmospheric Advection Simulation |date=2023-02-12 |title=Proceedings of the 2023 ACM/SIGDA International Symposium on Field Programmable Gate Arrays |chapter-url=https://dl.acm.org/doi/10.1145/3543622.3573047 |series=FPGA '23 |___location=New York, NY, USA |publisher=Association for Computing Machinery |pages=91–97 |doi=10.1145/3543622.3573047 |isbn=978-1-4503-9417-8|arxiv=2301.13016 }}</ref> in Versal and NPUs, [[Huawei]], and [[Google Pixel]] smartphones,<ref>{{Cite web|url=https://consumer.huawei.com/en/press/news/2017/ifa2017-kirin970|title=HUAWEI Reveals the Future of Mobile AI at IFA}}</ref> and seen in many [[Apple silicon]], [[Qualcomm]], [[Samsung]], and [[Google Tensor]] smartphone processors.<ref>{{Cite web| title=Snapdragon 8 Gen 3 mobile platform | url=https://docs.qualcomm.com/bundle/publicresource/87-71408-1_REV_B_Snapdragon_8_gen_3_Mobile_Platform_Product_Brief.pdf | archive-url=https://web.archive.org/web/20231025162610/https://docs.qualcomm.com/bundle/publicresource/87-71408-1_REV_B_Snapdragon_8_gen_3_Mobile_Platform_Product_Brief.pdf | archive-date=2023-10-25}}</ref>
On consumer devices, the NPU is intended to be small, power-efficient, but reasonably fast when used to run small models. To do this they are designed to support low-bitwidth operations using data types such as INT4, INT8, FP8, and FP16. A common metric is trillions of operations per second (TOPS), though this metric alone does not quantify which kind of operations are being done.<ref>{{cite web |title=A guide to AI TOPS and NPU performance metrics |url=https://www.qualcomm.com/news/onq/2024/04/a-guide-to-ai-tops-and-npu-performance-metrics}}</ref>
=== Datacenters ===
[[Graphics processing units]] designed by companies such as [[Nvidia]] and [[AMD]] often include AI-specific hardware, and are commonly used as AI accelerators, both for [[Machine learning|training]] and [[Inference engine|inference]].<ref>{{cite web| last1 = Patel| first1 = Dylan| last2 = Nishball| first2 = Daniel| last3 = Xie| first3 = Myron| title = Nvidia's New China AI Chips Circumvent US Restrictions| url=https://www.semianalysis.com/p/nvidias-new-china-ai-chips-circumvent| website = SemiAnalysis| date=2023-11-09| access-date=2024-02-07}}</ref>
== Programming ==
Mobile NPU vendors typically provide their own [[application programming interface]] such as the Snapdragon Neural Processing Engine. An operating system or a higher-level library may provide a more generic interface such as TensorFlow Lite with LiteRT Next (Android) or CoreML (iOS, macOS).
Consumer CPU-integrated NPUs are accessible through vendor-specific APIs. AMD (Ryzen AI), Intel (OpenVINO), Apple silicon (CoreML){{efn|MLX builds atop the CPU and GPU parts, not the Apple Neural Engine (ANE) part of Apple Silicon chips. The relatively good performance is due to the use of a large, fast [[unified memory]] design.}} each have their own APIs, which can be built upon by a higher-level library.
GPUs generally use existing [[GPGPU]] pipelines such as CUDA and OpenCL adapted for lower precisions. Custom-built systems such as the Google TPU use private interfaces.
▲AI accelerators are used in mobile devices such as Apple [[iPhone]]s and [[Huawei]] smartphones,<ref>{{Cite web|url=https://consumer.huawei.com/en/press/news/2017/ifa2017-kirin970|title=HUAWEI Reveals the Future of Mobile AI at IFA}}</ref> and seen in many [[Qualcomm]] and [[Samsung]] smartphone processors,<ref>https://docs.qualcomm.com/bundle/publicresource/87-71408-1_REV_B_Snapdragon_8_gen_3_Mobile_Platform_Product_Brief.pdf</ref> some [[Intel]] <ref>{{Cite web|url=https://www.intel.com/content/www/us/en/newsroom/news/intels-lunar-lake-processors-arriving-q3-2024.html|title=Intel's Lunar Lake Processors Arriving Q3 2024|website=Intel|date=May 20, 2024 }}</ref> and [[AMD]] computer processors,<ref>{{cite web|title=AMD XDNA Architecture|url=https://www.amd.com/en/technologies/xdna.html}}</ref> and [[Apple silicon]] [[Mac (computer)|Macs]].<ref>{{Cite web |title=Deploying Transformers on the Apple Neural Engine |url=https://machinelearning.apple.com/research/neural-engine-transformers |access-date=2023-08-24 |website=Apple Machine Learning Research |language=en-US}}</ref> Accelerators are used in [[cloud computing]] servers, including [[tensor processing unit]]s (TPU) in [[Google Cloud Platform]]<ref>{{Cite journal|date=2017-06-24|title=In-Datacenter Performance Analysis of a Tensor Processing Unit|journal=ACM SIGARCH Computer Architecture News|volume=45|issue=2|pages=1–12|language=EN|doi=10.1145/3140659.3080246|doi-access=free |last1=Jouppi |first1=Norman P. |last2=Young |first2=Cliff |last3=Patil |first3=Nishant |last4=Patterson |first4=David |last5=Agrawal |first5=Gaurav |last6=Bajwa |first6=Raminder |last7=Bates |first7=Sarah |last8=Bhatia |first8=Suresh |last9=Boden |first9=Nan |last10=Borchers |first10=Al |last11=Boyle |first11=Rick |last12=Cantin |first12=Pierre-luc |last13=Chao |first13=Clifford |last14=Clark |first14=Chris |last15=Coriell |first15=Jeremy |last16=Daley |first16=Mike |last17=Dau |first17=Matt |last18=Dean |first18=Jeffrey |last19=Gelb |first19=Ben |last20=Ghaemmaghami |first20=Tara Vazir |last21=Gottipati |first21=Rajendra |last22=Gulland |first22=William |last23=Hagmann |first23=Robert |last24=Ho |first24=C. Richard |last25=Hogberg |first25=Doug |last26=Hu |first26=John |last27=Hundt |first27=Robert |last28=Hurt |first28=Dan |last29=Ibarz |first29=Julian |last30=Jaffey |first30=Aaron |display-authors=1 |arxiv=1704.04760 }}</ref> and [[Trainium]] and [[Inferentia]] chips in [[Amazon Web Services]].<ref>{{cite web | title = How silicon innovation became the 'secret sauce' behind AWS's success| website = Amazon Science| date = July 27, 2022| url = https://www.amazon.science/how-silicon-innovation-became-the-secret-sauce-behind-awss-success| access-date = July 19, 2024}}</ref> Many vendor-specific terms exist for devices in this category, and it is an [[emerging technologies|emerging technology]] without a [[dominant design]].
==Notes==
▲[[Graphics processing units]] designed by companies such as [[Nvidia]] and [[AMD]] often include AI-specific hardware, and are commonly used as AI accelerators, both for [[Machine learning|training]] and [[Inference engine|inference]].<ref>{{cite web| last1 = Patel| first1 = Dylan| last2 = Nishball| first2 = Daniel| last3 = Xie| first3 = Myron| title = Nvidia's New China AI Chips Circumvent US Restrictions| url=https://www.semianalysis.com/p/nvidias-new-china-ai-chips-circumvent| website = SemiAnalysis| date=2023-11-09| access-date=2024-02-07}}</ref> All models of Intel [[Meteor Lake]] processors have a built-in ''versatile processor unit'' (''VPU'') for accelerating [[statistical inference|inference]] for computer vision and deep learning.<ref>{{Cite web|url=https://www.pcmag.com/news/intel-to-bring-a-vpu-processor-unit-to-14th-gen-meteor-lake-chips|title=Intel to Bring a 'VPU' Processor Unit to 14th Gen Meteor Lake Chips|website=PCMAG|date=August 2022 }}</ref>
{{notelist}}
== References ==
{{Reflist
== External links ==
|