Neural processing unit: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 12:00, 10 July 2025 edit Artoria2e5 (talk \| contribs) Extended confirmed users, IP block exemptions 38,981 edits ~ ← Previous edit		Latest revision as of 20:06, 8 August 2025 edit undo Citation bot (talk \| contribs) Bots 5,869,584 edits Alter: title, template type. Add: chapter-url, chapter. Removed or converted URL. Removed parameters. Some additions/deletions were parameter name changes. \| Use this bot. Report bugs. \| Suggested by Headbomb \| #UCB_toolbar
(9 intermediate revisions by 8 users not shown)
Line 2: {{Use American English\|date=January 2019}} {{Use mdy dates\|date=October 2021}} A '''neural processing unit''' ('''NPU'''), also known as '''AI accelerator''' or '''deep learning processor,''', is a class of specialized [[hardware acceleration\|hardware accelerator]]<ref>{{cite web \|url=https://www.v3.co.uk/v3-uk/news/3014293/intel-unveils-movidius-compute-stick-usb-ai-accelerator \|title=Intel unveils Movidius Compute Stick USB AI Accelerator \|date=July 21, 2017 \|access-date=August 11, 2017 \|archive-url=https://web.archive.org/web/20170811193632/https://www.v3.co.uk/v3-uk/news/3014293/intel-unveils-movidius-compute-stick-usb-ai-accelerator \|archive-date=August 11, 2017 }}</ref> or computer system<ref>{{cite web \|url=https://insidehpc.com/2017/06/inspurs-unveils-gx4-ai-accelerator/ \|title=Inspurs unveils GX4 AI Accelerator \|date=June 21, 2017}}</ref><ref>{{citation \|title=Neural Magic raises $15 million to boost AI inferencing speed on off-the-shelf processors \|last=Wiggers \|first=Kyle \|date=November 6, 2019 \|url=https://venturebeat.com/2019/11/06/neural-magic-raises-15-million-to-boost-ai-training-speed-on-off-the-shelf-processors/ \|publication-date=November 6, 2019 \|orig-date=2019 \|archive-url=https://web.archive.org/web/20200306120524/https://venturebeat.com/2019/11/06/neural-magic-raises-15-million-to-boost-ai-training-speed-on-off-the-shelf-processors/ \|archive-date=March 6, 2020 \|access-date=March 14, 2020}}</ref> designed to accelerate [[artificial intelligence]] (AI) and [[machine learning]] applications, including [[artificial neural network]]s and [[computer vision]]. ==Use== Their purpose is either to efficiently execute already trained AI models (inference) or to train AI models. Their applications include [[algorithm]]s for [[robotics]], [[Internet of things]], and [[data (computing)\|data]]-intensive or sensor-driven tasks.<ref>{{cite web \|url=https://www.eetimes.com/google-designing-ai-processors/ \|title=Google Designing AI Processors\|date=May 18, 2016 }} Google using its own AI accelerators.</ref> They are often [[Manycore processor\|manycore]] or [[Spatial architecture\|spatial]] designs and focus on [[precision (computer science)\|low-precision]] arithmetic, novel [[dataflow architecture]]s, or [[in-memory computing]] capability. {{As of\|2024}}, a typical datacenter-grade AI [[integrated circuit]] chip, the H100 GPU, [[transistor count\|contains tens of billions]] of [[MOSFET]]s.<ref>{{cite web\|url=https://www.datacenterdynamics.com/en/news/nvidia-reveals-new-hopper-h100-gpu-with-80-billion-transistors/\|title=Nvidia reveals new Hopper H100 GPU, with 80 billion transistors\|last=Moss\|first=Sebastian\|date=2022-03-23\|website=Data Center Dynamics\|access-date=2024-01-30}}</ref> === Consumer devices === AI accelerators are used in mobile devices such as Apple [[iPhone]]s, AMD [[AI engine\|AI engines]]<ref>{{Cite book \|last=Brown \|first=Nick \|chapter=Exploring the Versal AI Engines for Accelerating Stencil-based Atmospheric Advection Simulation \|date=2023-02-12 \|title=Proceedings of the 2023 ACM/SIGDA International Symposium on Field Programmable Gate Arrays \|chapter-url=https://dl.acm.org/doi/10.1145/3543622.3573047 \|series=FPGA '23 \|___location=New York, NY, USA \|publisher=Association for Computing Machinery \|pages=91–97 \|doi=10.1145/3543622.3573047 \|isbn=978-1-4503-9417-8\|arxiv=2301.13016 }}</ref> in Versal and NPUs, [[Huawei]], and [[Google Pixel]] smartphones,<ref>{{Cite web\|url=https://consumer.huawei.com/en/press/news/2017/ifa2017-kirin970\|title=HUAWEI Reveals the Future of Mobile AI at IFA}}</ref> and seen in many [[Apple ~~Silicon~~silicon]], [[Qualcomm]], [[Samsung]], and [[Google Tensor]] smartphone processors.<ref>{{Cite web\| title=Snapdragon 8 Gen 3 mobile platform \| url=https://docs.qualcomm.com/bundle/publicresource/87-71408-1_REV_B_Snapdragon_8_gen_3_Mobile_Platform_Product_Brief.pdf \| archive-url=https://web.archive.org/web/20231025162610/https://docs.qualcomm.com/bundle/publicresource/87-71408-1_REV_B_Snapdragon_8_gen_3_Mobile_Platform_Product_Brief.pdf \| archive-date=2023-10-25}}</ref> It is more recently (circa 2022) added to computer processors from [[Intel]],<ref>{{Cite web\|url=https://www.intel.com/content/www/us/en/newsroom/news/intels-lunar-lake-processors-arriving-q3-2024.html\|title=Intel's Lunar Lake Processors Arriving Q3 2024\|website=Intel\|date=May 20, 2024 }}</ref> [[AMD]],<ref>{{cite web\|title=AMD XDNA Architecture\|url=https://www.amd.com/en/technologies/xdna.html}}</ref> and [[Apple silicon]].<ref>{{Cite web \|title=Deploying Transformers on the Apple Neural Engine \|url=https://machinelearning.apple.com/research/neural-engine-transformers \|access-date=2023-08-24 \|website=Apple Machine Learning Research \|language=en-US}}</ref> All models of Intel [[Meteor Lake]] processors have a built-in ''versatile processor unit'' (''VPU'') for accelerating [[statistical inference\|inference]] for computer vision and deep learning.<ref>{{Cite web\|url=https://www.pcmag.com/news/intel-to-bring-a-vpu-processor-unit-to-14th-gen-meteor-lake-chips\|title=Intel to Bring a 'VPU' Processor Unit to 14th Gen Meteor Lake Chips\|website=PCMAG\|date=August 2022 }}</ref> On consumer devices, the NPU is intended to be small, power-efficient, but reasonably fast when used to run small models. To do this they are designed to support low-bitwidth operations using data types such as INT4, INT8, FP8, and FP16. A common metric is trillions of operations per second (TOPS), though this metric alone does not quantify which kind of operations are being done.<ref>{{cite web \|title=A guide to AI TOPS and NPU performance metrics \|url=https://www.qualcomm.com/news/onq/2024/04/a-guide-to-ai-tops-and-npu-performance-metrics}}</ref> Line 20 ⟶ 21: == Programming == Mobile NPU vendors typically provide their own [[application programming interface]] such as the Snapdragon Neural Processing Engine. An operating system or a higher-level library may provide a more generic interface such as TensorFlow Lite with LiteRT Next (~~examples~~Android) ~~are~~or ~~for Android as~~CoreML (iOS, ~~has no equivalent public interface~~macOS). Consumer CPU-integrated NPUs are accessible through vendor-specific APIs. AMD (Ryzen AI), Intel (OpenVINO), Apple ~~Silicon~~silicon (CoreML){{efn\|MLX builds atop the CPU and GPU parts, not the Apple Neural Engine (ANE) part of Apple Silicon chips. The relatively good performance is due to the use of a large, fast [[unified memory]] design.}} each have their own APIs, which can be built upon by a higher-level library. GPUs generally use existing [[GPGPU]] pipelines such as CUDA and OpenCL adapted for lower precisions. Custom-built systems such as the Google TPU use private interfaces. ==Notes== {{notelist}} == References == {{Reflist~~\|32em~~}} == External links ==