Heterogeneous System Architecture: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 16:10, 21 September 2017 edit PekkaJ (talk \| contribs) 40 edits syscalls are not (so far) included in HSA PRM specs ← Previous edit		Latest revision as of 05:37, 6 August 2025 edit undo GreenC bot (talk \| contribs) Bots 3,066,955 edits Rescued 1 archive link. Wayback Medic 2.5 per WP:URLREQ#anandtech.com
(28 intermediate revisions by 20 users not shown)
Line 1: {{Short description\|Computing system}} '''Heterogeneous System Architecture''' ('''HSA''') is a cross-vendor set of specifications that allow for the integration of [[central processing unit]]s and [[GPU\|graphics processors]] on the same bus, with shared [[Main memory\|memory]] and [[Task (computing)\|tasks]].<ref>{{cite web \|url=http://www.tomshardware.com/news/AMD-HSA-hUMA-APU,22324.html \|title=AMD Unveils its Heterogeneous Uniform Memory Access (hUMA) Technology \|website=Tom's Hardware \|author=Tarun Iyer \|date=30 April 2013}}</ref> The HSA is being developed by the [[HSA Foundation]], which includes (among many others) [[Advanced Micro Devices\|AMD]] and [[ARM Holdings\|ARM]]. The platform's stated aim is to reduce [[communication latency]] between CPUs, GPUs and other [[compute device]]s, and make these various devices more compatible from a programmer's perspective,<ref name="whitepaper">{{Cite report \|author=George Kyriazis \|date=30 August 2012 \|title=Heterogeneous System Architecture: A Technical Review \|url=http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2012/10/hsa10.pdf \|publisher=AMD \|access-date=26 May 2014 \|archive-url=https://web.archive.org/web/20140328140823/http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2012/10/hsa10.pdf \|archive-date=28 March 2014 \|url-status=dead }}</ref>{{rp\|3}}<ref name="whatis">{{cite web \|title=What is Heterogeneous System Architecture (HSA)? \|url=http://developer.amd.com/resources/heterogeneous-computing/what-is-heterogeneous-system-architecture-hsa/ \|publisher=AMD \|~~accessdate~~access-date=23 May 2014 \|archive-url=https://web.archive.org/web/20140621213832/http://developer.amd.com/resources/heterogeneous-computing/what-is-heterogeneous-system-architecture-hsa/ \|archive-date=21 June 2014 \|url-status=dead }}</ref> relieving the programmer of the task of planning the moving of data between devices' disjoint memories (as must currently be done with [[OpenCL]] or [[CUDA]]).<ref>{{cite web \|author=Joel Hruska \|title=Setting HSAIL: AMD explains the future of CPU/GPU cooperation \|url=http://www.extremetech.com/gaming/164817-setting-hsail-amd-cpu-gpu-cooperation \|website=[[ExtremeTech]] \|publisher=[[Ziff Davis]] \|date=2013-08-26}}</ref> CUDA and OpenCL as well as most other fairly advanced programming languages can use HSA to increase their execution performance.<ref>{{cite web\|url=http://www.slideshare.net/mobile/linaroorg/hsa-linaro-updatejuly102013\|title=LCE13: Heterogeneous System Architecture (HSA) on ARM\|author=Linaro\|work=slideshare.net\|date=21 March 2014}}</ref> [[Heterogeneous computing]] is widely used in [[MPSoC\|system-on-chip]] devices such as [[Tablet computer\|tablets]], [[smartphone]]s, other mobile devices, and [[video game console]]s.<ref name="gpuscience">{{cite web \| url = http://gpuscience.com/cs/heterogeneous-system-architecture-purpose-and-outlook/ \| ~~archiveurl~~archive-url = https://web.archive.org/web/20140201183411/http://gpuscience.com/cs/heterogeneous-system-architecture-purpose-and-outlook/ \| title = Heterogeneous System Architecture: Purpose and Outlook \| date = 2012-11-09 \| ~~accessdate~~access-date = 2014-05-24 \| ~~archivedate~~archive-date = 2014-02-01 \| website = gpuscience.com }}</ref> HSA allows programs to use the graphics processor for [[floating point]] calculations without separate memory or scheduling.<ref>{{cite web \|title=Heterogeneous system architecture: Multicore image processing using a mix of CPU and GPU elements \|website=Embedded Computing Design \|url=http://embedded-computing.com/articles/heterogeneous-processing-using-mix-cpu-gpu-elements/ \|~~accessdate~~access-date=23 May 2014}}</ref> ==Rationale== Line 17 ⟶ 18: \| height = 190 \| align = center ~~\| lines = 3~~ \| File:HSA – using the GPU without HSA.svg \| Steps performed when offloading calculations to the [[Graphics processing unit\|GPU]] on a non-HSA system Line 29 ⟶ 28: ==Overview== {{~~Refimprove~~More citations needed section\|date=May 2014}} Originally introduced by [[embedded system]]s such as the [[Cell Broadband Engine]], sharing system memory directly between multiple system actors makes heterogeneous computing more mainstream. Heterogeneous computing itself refers to systems that contain multiple processing units{{snd}} [[central processing unit]]s (CPUs), [[graphics processing unit]]s (GPUs), [[digital signal processor]]s (DSPs), or any type of [[application-specific integrated circuit]]s (ASICs). The system architecture allows any accelerator, for instance a [[GPU\|graphics processor]], to operate at the same processing level as the system's CPU. Line 37 ⟶ 36: ===HSA Intermediate Layer===<!--incoming redirect--> ~~HSA~~HSAIL ~~Intermediate~~(Heterogeneous ~~Layer~~System ~~(HSAIL~~Architecture Intermediate Language), a [[p-code machine\|virtual instruction set]] for parallel programs * similar{{according to whom\|date=May 2015}} to [[LLVM Intermediate Representation]] and [[Standard Portable Intermediate Representation\|SPIR]] (used by [[OpenCL]] and [[Vulkan (API)\|Vulkan]]) * finalized to a specific instruction set by a [[Just-in-time compilation\|JIT compiler]] Line 59 ⟶ 58: ===Block diagrams=== The ~~block diagrams~~illustrations below ~~provide~~compare ~~high~~CPU-~~level~~GPU ~~illustrations~~coordination ~~of how~~under HSA ~~operates and how it compares~~versus tounder traditional architectures. {{Gallery Line 65 ⟶ 64: \| height = 190 \| align = center ~~\| lines = 3~~ \| File:Desktop computer bus bandwidths.svg \| Standard architecture with a discrete [[graphics card\|GPU]] attached to the [[PCI Express]] bus. [[Zero-copy]] between the GPU and CPU is not possible due to distinct physical memories. \|File:HSA-enabled virtual memory with distinct graphics card.svg \| HSA brings unified virtual memory, and facilitates passing pointers over PCI Express instead of copying the entire data. \| File:Integrated graphics with distinct memory allocation.svg \| In partitioned main memory, one part of the system memory is exclusively allocated to the GPU. As a result, zero-copy operation ~~are~~is not possible. \| File:HSA-enabled integrated graphics.svg \| Unified main memory, ~~made~~where ~~possible~~GPU byand aCPU ~~combination of~~are HSA-enabled ~~GPU and CPU~~. AsThis ~~a result, it is possible to perform~~makes zero-copy ~~operations~~operation possible.<ref>{{cite web \|url=http://www.semiaccurate.com/2014/01/15/technical-look-amds-kaveri-architecture/ \|title=Kaveri microarchitecture \|date=2014-01-15 \|work=[[SemiAccurate]]}}</ref> \| File:MMU and IOMMU.svg \| ~~Both the~~The CPU's [[Memory management unit\|MMU]] and the GPU's [[IOMMU]] ~~have~~must toboth comply with ~~the~~ HSA hardware specifications. }} Line 89 ⟶ 86: \| url = https://www.phoronix.com/scan.php?page=news_item&px=MTc0NTk \| title = AMDKFD Driver Still Evolving For Open-Source HSA On Linux \| date = ~~July~~ 21, July 2014 \| ~~accessdate~~access-date = ~~January~~ 21, January 2015 \| author = Michael Larabel \| publisher = [[Phoronix]] }}</ref><ref name="kernelnewbies-3.19" />]] Some of the HSA-specific features implemented in the hardware need to be supported by the [[operating system kernel]] and specific device drivers. For example, support for AMD [[Radeon]] and [[AMD FirePro]] graphics cards, and [[AMD Accelerated Processing Unit\|APUs]] based on [[Graphics Core Next]] (GCN), was merged into version 3.19 of the [[Linux kernel mainline]], released on ~~February~~ 8, February 2015.<ref name="kernelnewbies-3.19">{{cite web \| url = http://kernelnewbies.org/Linux_3.19#head-ae54e026ef7588f4431f7e94178d27d5cd830bbf \| title = Linux kernel 3.19, Section 1.3. HSA driver for AMD GPU devices \| date = ~~February~~ 8, February 2015 \| ~~accessdate~~access-date = ~~February~~ 12, February 2015 \| website = kernelnewbies.org }}</ref> Programs do not interact directly with {{Mono\|amdkfd}}{{Explain\|date=December 2023}}, but queue their jobs utilizing the HSA runtime.<ref>{{cite web \| url = https://github.com/HSAFoundation/HSA-Runtime-Reference-Source/blob/master/README.md \| title = HSA-Runtime-Reference-Source/README.md at master \| date = ~~November~~ 14, November 2014 \| ~~accessdate~~access-date = ~~February~~ 12, February 2015 \| website = github.com }}</ref> This very first implementation, known as {{Mono\|amdkfd}}, focuses on [[AMD Accelerated Processing Unit#Steamroller architecture .282014.29: Kaveri\|"Kaveri"]] or "Berlin" APUs and works alongside the existing Radeon kernel graphics driver. Additionally, {{Mono\|amdkfd}} supports ''heterogeneous queuing'' (HQ), which aims to simplify the distribution of computational jobs among multiple CPUs and GPUs from the programmer's perspective. ~~{{As of\|2015\|2}}, support~~Support for ''heterogeneous memory management'' (''HMM''), suited only for graphics hardware featuring version 2 of the AMD's [[IOMMU]], ~~has not yet been~~was accepted into the Linux kernel mainline version 4.14.<ref>{{cite web\|url=https://www.xda-developers.com/linux-kernel-414/\|archive-url=https://web.archive.org/web/20171113231202/https://www.xda-developers.com/linux-kernel-414/\|url-status=dead\|archive-date=13 November 2017\|title=Linux Kernel 4.14 Announced with Secure Memory Encryption and More\|date=13 November 2017}}</ref> Integrated support for HSA platforms has been announced for the "Sumatra" release of [[OpenJDK]], due in 2015.<ref>{{cite web \|url=http://www.hpcwire.com/2013/08/26/hsa_foundation_aims_to_boost_javas_gpu_prowess/ \|title=HSA Foundation Aims to Boost ~~Java’s~~Java's GPU Prowess \|author=Alex Woodie \|date=26 August 2013 \|website=HPCwire}}</ref> [[AMD APP SDK]] is AMD's proprietary software development kit targeting [[parallel computing]], available for Microsoft Windows and Linux. Bolt is a C++ template library optimized for heterogeneous computing.<ref>{{cite web \|url=https://github.com/HSA-Libraries/Bolt \|title=Bolt on github\|website=[[GitHub]]\|date=11 January 2022}}</ref> [[GPUOpen]] comprehends a couple of other software tools related to HSA. [[CodeXL]] version 2.0 includes an HSA profiler.<ref>{{cite web \|url=http://gpuopen.com/codexl-2-0-is-here-and-open-source/ \|title=CodeXL 2.0 includes HSA profiler \|author=AMD GPUOpen \|date=2016-04-19 \|access-date=21 April 2016 \|archive-date=27 June 2018 \|archive-url=https://web.archive.org/web/20180627034628/https://gpuopen.com/codexl-2-0-is-here-and-open-source/ \|url-status=dead }}</ref> {{Clear}} Line 117 ⟶ 114: ==Hardware support== ===AMD=== {{As of\|2015\|2}}, only AMD's "Kaveri" A-series APUs (cf. [[List of AMD Accelerated Processing Unit microprocessors#"Kaveri" (2014, 28 nm)\|"Kaveri" desktop processors]] and [[List of AMD Accelerated Processing Unit microprocessors#"Kaveri" 2014, 28 nm\|"Kaveri" mobile processors]]) and Sony's [[PlayStation 4]] allowed the [[Graphics processing unit#~~Integrated_graphics~~Integrated graphics\|integrated GPU]] to access memory via version 2 of the AMD's IOMMU. Earlier APUs (Trinity and Richland) included the version 2 IOMMU functionality, but only for use by an external GPU connected via PCI Express.{{Citation needed\|date=June 2016}} Post-2015 Carrizo and Bristol Ridge APUs also include the version 2 IOMMU functionality for the integrated GPU.{{Citation needed\|date=June 2016}} Line 124 ⟶ 121: ===ARM=== ARM's [[Bifrost (microarchitecture)\|Bifrost]] microarchitecture, as implemented in the Mali-G71,<ref>{{cite web \|url=http://www.anandtech.com/show/10375/arm-unveils-bifrost-and-mali-g71/5 \|archive-url=https://archive.today/20160910101608/http://www.anandtech.com/show/10375/arm-unveils-bifrost-and-mali-g71/5 \|url-status=dead \|archive-date=10 September 2016 \|title=ARM Bifrost GPU Architecture \|date=2016-05-30}}</ref> is fully compliant with the HSA 1.1 hardware specifications. {{As of\|2016\|6}}, ARM has not announced software support that would use this hardware feature. ==See also== Line 132 ⟶ 129: * [[Shared memory]] * [[Zero-copy]] * A technique enabling zero-copy operation for a CPU and a parallel accelerator <ref> Computer memory architecture for hybrid serial and parallel computing systems, US patents 7,707,388, 2010 and 8,145,879, 2012. Inventor: [[Uzi Vishkin]] </ref> ==References== Line 139 ⟶ 137: {{Commons category}} * {{YouTube\|id=ln8JpfaLvbM\|title=HSA Heterogeneous System Architecture Overview}} by Vinod Tipparaju at [[ACM/IEEE Supercomputing Conference\|SC13]] in November 2013 * [https://web.archive.org/web/20160514070602/http://www.mpsoc-forum.org/previous/2013/slides/8-Hegde.pdf HSA and the software ecosystem] * [http://www-conf.slac.stanford.edu/xldb2012/talks/xldb2012_wed_1400_MichaelHouston.pdf 2012 – HSA by Michael Houston] {{Webarchive\|url=https://web.archive.org/web/20160305141652/http://www-conf.slac.stanford.edu/xldb2012/talks/xldb2012_wed_1400_MichaelHouston.pdf \|date=5 March 2016 }} {{Use dmy dates\|date=July 2019}} [[Category:Heterogeneous System Architecture\| ]]