Heterogeneous System Architecture: Difference between revisions

Content deleted Content added
PekkaJ (talk | contribs)
syscalls are not (so far) included in HSA PRM specs
GreenC bot (talk | contribs)
Rescued 1 archive link. Wayback Medic 2.5 per WP:URLREQ#anandtech.com
 
(28 intermediate revisions by 20 users not shown)
Line 1:
{{Short description|Computing system}}
'''Heterogeneous System Architecture''' ('''HSA''') is a cross-vendor set of specifications that allow for the integration of [[central processing unit]]s and [[GPU|graphics processors]] on the same bus, with shared [[Main memory|memory]] and [[Task (computing)|tasks]].<ref>{{cite web |url=http://www.tomshardware.com/news/AMD-HSA-hUMA-APU,22324.html |title=AMD Unveils its Heterogeneous Uniform Memory Access (hUMA) Technology |website=Tom's Hardware |author=Tarun Iyer |date=30 April 2013}}</ref> The HSA is being developed by the [[HSA Foundation]], which includes (among many others) [[Advanced Micro Devices|AMD]] and [[ARM Holdings|ARM]]. The platform's stated aim is to reduce [[communication latency]] between CPUs, GPUs and other [[compute device]]s, and make these various devices more compatible from a programmer's perspective,<ref name="whitepaper">{{Cite report |author=George Kyriazis |date=30 August 2012 |title=Heterogeneous System Architecture: A Technical Review |url=http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2012/10/hsa10.pdf |publisher=AMD |access-date=26 May 2014 |archive-url=https://web.archive.org/web/20140328140823/http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2012/10/hsa10.pdf |archive-date=28 March 2014 |url-status=dead }}</ref>{{rp|3}}<ref name="whatis">{{cite web |title=What is Heterogeneous System Architecture (HSA)? |url=http://developer.amd.com/resources/heterogeneous-computing/what-is-heterogeneous-system-architecture-hsa/ |publisher=AMD |accessdateaccess-date=23 May 2014 |archive-url=https://web.archive.org/web/20140621213832/http://developer.amd.com/resources/heterogeneous-computing/what-is-heterogeneous-system-architecture-hsa/ |archive-date=21 June 2014 |url-status=dead }}</ref> relieving the programmer of the task of planning the moving of data between devices' disjoint memories (as must currently be done with [[OpenCL]] or [[CUDA]]).<ref>{{cite web |author=Joel Hruska |title=Setting HSAIL: AMD explains the future of CPU/GPU cooperation |url=http://www.extremetech.com/gaming/164817-setting-hsail-amd-cpu-gpu-cooperation |website=[[ExtremeTech]] |publisher=[[Ziff Davis]] |date=2013-08-26}}</ref>
 
CUDA and OpenCL as well as most other fairly advanced programming languages can use HSA to increase their execution performance.<ref>{{cite web|url=http://www.slideshare.net/mobile/linaroorg/hsa-linaro-updatejuly102013|title=LCE13: Heterogeneous System Architecture (HSA) on ARM|author=Linaro|work=slideshare.net|date=21 March 2014}}</ref> [[Heterogeneous computing]] is widely used in [[MPSoC|system-on-chip]] devices such as [[Tablet computer|tablets]], [[smartphone]]s, other mobile devices, and [[video game console]]s.<ref name="gpuscience">{{cite web
| url = http://gpuscience.com/cs/heterogeneous-system-architecture-purpose-and-outlook/
| archiveurlarchive-url = https://web.archive.org/web/20140201183411/http://gpuscience.com/cs/heterogeneous-system-architecture-purpose-and-outlook/
| title = Heterogeneous System Architecture: Purpose and Outlook
| date = 2012-11-09 | accessdateaccess-date = 2014-05-24
| archivedatearchive-date = 2014-02-01
| website = gpuscience.com
}}</ref> HSA allows programs to use the graphics processor for [[floating point]] calculations without separate memory or scheduling.<ref>{{cite web |title=Heterogeneous system architecture: Multicore image processing using a mix of CPU and GPU elements |website=Embedded Computing Design |url=http://embedded-computing.com/articles/heterogeneous-processing-using-mix-cpu-gpu-elements/ |accessdateaccess-date=23 May 2014}}</ref>
 
==Rationale==
Line 17 ⟶ 18:
| height = 190
| align = center
| lines = 3
 
| File:HSA – using the GPU without HSA.svg
| Steps performed when offloading calculations to the [[Graphics processing unit|GPU]] on a non-HSA system
Line 29 ⟶ 28:
 
==Overview==
{{RefimproveMore citations needed section|date=May 2014}}
Originally introduced by [[embedded system]]s such as the [[Cell Broadband Engine]], sharing system memory directly between multiple system actors makes heterogeneous computing more mainstream. Heterogeneous computing itself refers to systems that contain multiple processing units{{snd}} [[central processing unit]]s (CPUs), [[graphics processing unit]]s (GPUs), [[digital signal processor]]s (DSPs), or any type of [[application-specific integrated circuit]]s (ASICs). The system architecture allows any accelerator, for instance a [[GPU|graphics processor]], to operate at the same processing level as the system's CPU.
 
Line 37 ⟶ 36:
 
===HSA Intermediate Layer===<!--incoming redirect-->
HSAHSAIL Intermediate(Heterogeneous LayerSystem (HSAILArchitecture Intermediate Language), a [[p-code machine|virtual instruction set]] for parallel programs
* similar{{according to whom|date=May 2015}} to [[LLVM Intermediate Representation]] and [[Standard Portable Intermediate Representation|SPIR]] (used by [[OpenCL]] and [[Vulkan (API)|Vulkan]])
* finalized to a specific instruction set by a [[Just-in-time compilation|JIT compiler]]
Line 59 ⟶ 58:
 
===Block diagrams===
The block diagramsillustrations below providecompare highCPU-levelGPU illustrationscoordination of howunder HSA operates and how it comparesversus tounder traditional architectures.
 
{{Gallery
Line 65 ⟶ 64:
| height = 190
| align = center
| lines = 3
 
| File:Desktop computer bus bandwidths.svg
| Standard architecture with a discrete [[graphics card|GPU]] attached to the [[PCI Express]] bus. [[Zero-copy]] between the GPU and CPU is not possible due to distinct physical memories.
 
|File:HSA-enabled virtual memory with distinct graphics card.svg
| HSA brings unified virtual memory, and facilitates passing pointers over PCI Express instead of copying the entire data.
 
| File:Integrated graphics with distinct memory allocation.svg
| In partitioned main memory, one part of the system memory is exclusively allocated to the GPU. As a result, zero-copy operation areis not possible.
 
| File:HSA-enabled integrated graphics.svg
| Unified main memory, madewhere possibleGPU byand aCPU combination ofare HSA-enabled GPU and CPU. AsThis a result, it is possible to performmakes zero-copy operationsoperation possible.<ref>{{cite web |url=http://www.semiaccurate.com/2014/01/15/technical-look-amds-kaveri-architecture/ |title=Kaveri microarchitecture |date=2014-01-15 |work=[[SemiAccurate]]}}</ref>
 
| File:MMU and IOMMU.svg
| Both theThe CPU's [[Memory management unit|MMU]] and the GPU's [[IOMMU]] havemust toboth comply with the HSA hardware specifications.
}}
 
Line 89 ⟶ 86:
| url = https://www.phoronix.com/scan.php?page=news_item&px=MTc0NTk
| title = AMDKFD Driver Still Evolving For Open-Source HSA On Linux
| date = July 21, July 2014 | accessdateaccess-date = January 21, January 2015
| author = Michael Larabel | publisher = [[Phoronix]]
}}</ref><ref name="kernelnewbies-3.19" />]]
 
Some of the HSA-specific features implemented in the hardware need to be supported by the [[operating system kernel]] and specific device drivers. For example, support for AMD [[Radeon]] and [[AMD FirePro]] graphics cards, and [[AMD Accelerated Processing Unit|APUs]] based on [[Graphics Core Next]] (GCN), was merged into version 3.19 of the [[Linux kernel mainline]], released on February 8, February 2015.<ref name="kernelnewbies-3.19">{{cite web
| url = http://kernelnewbies.org/Linux_3.19#head-ae54e026ef7588f4431f7e94178d27d5cd830bbf
| title = Linux kernel 3.19, Section 1.3. HSA driver for AMD GPU devices
| date = February 8, February 2015 | accessdateaccess-date = February 12, February 2015
| website = kernelnewbies.org
}}</ref> Programs do not interact directly with {{Mono|amdkfd}}{{Explain|date=December 2023}}, but queue their jobs utilizing the HSA runtime.<ref>{{cite web
| url = https://github.com/HSAFoundation/HSA-Runtime-Reference-Source/blob/master/README.md
| title = HSA-Runtime-Reference-Source/README.md at master
| date = November 14, November 2014 | accessdateaccess-date = February 12, February 2015
| website = github.com
}}</ref> This very first implementation, known as {{Mono|amdkfd}}, focuses on [[AMD Accelerated Processing Unit#Steamroller architecture .282014.29: Kaveri|"Kaveri"]] or "Berlin" APUs and works alongside the existing Radeon kernel graphics driver.
 
Additionally, {{Mono|amdkfd}} supports ''heterogeneous queuing'' (HQ), which aims to simplify the distribution of computational jobs among multiple CPUs and GPUs from the programmer's perspective. {{As of|2015|2}}, supportSupport for ''heterogeneous memory management'' (''HMM''), suited only for graphics hardware featuring version 2 of the AMD's [[IOMMU]], has not yet beenwas accepted into the Linux kernel mainline version 4.14.<ref>{{cite web|url=https://www.xda-developers.com/linux-kernel-414/|archive-url=https://web.archive.org/web/20171113231202/https://www.xda-developers.com/linux-kernel-414/|url-status=dead|archive-date=13 November 2017|title=Linux Kernel 4.14 Announced with Secure Memory Encryption and More|date=13 November 2017}}</ref>
 
Integrated support for HSA platforms has been announced for the "Sumatra" release of [[OpenJDK]], due in 2015.<ref>{{cite web |url=http://www.hpcwire.com/2013/08/26/hsa_foundation_aims_to_boost_javas_gpu_prowess/ |title=HSA Foundation Aims to Boost Java’sJava's GPU Prowess |author=Alex Woodie |date=26 August 2013 |website=HPCwire}}</ref>
 
[[AMD APP SDK]] is AMD's proprietary software development kit targeting [[parallel computing]], available for Microsoft Windows and Linux. Bolt is a C++ template library optimized for heterogeneous computing.<ref>{{cite web |url=https://github.com/HSA-Libraries/Bolt |title=Bolt on github|website=[[GitHub]]|date=11 January 2022}}</ref>
 
[[GPUOpen]] comprehends a couple of other software tools related to HSA. [[CodeXL]] version 2.0 includes an HSA profiler.<ref>{{cite web |url=http://gpuopen.com/codexl-2-0-is-here-and-open-source/ |title=CodeXL 2.0 includes HSA profiler |author=AMD GPUOpen |date=2016-04-19 |access-date=21 April 2016 |archive-date=27 June 2018 |archive-url=https://web.archive.org/web/20180627034628/https://gpuopen.com/codexl-2-0-is-here-and-open-source/ |url-status=dead }}</ref>
 
{{Clear}}
Line 117 ⟶ 114:
==Hardware support==
===AMD===
{{As of|2015|2}}, only AMD's "Kaveri" A-series APUs (cf. [[List of AMD Accelerated Processing Unit microprocessors#"Kaveri" (2014, 28 nm)|"Kaveri" desktop processors]] and [[List of AMD Accelerated Processing Unit microprocessors#"Kaveri" 2014, 28 nm|"Kaveri" mobile processors]]) and Sony's [[PlayStation 4]] allowed the [[Graphics processing unit#Integrated_graphicsIntegrated graphics|integrated GPU]] to access memory via version 2 of the AMD's IOMMU. Earlier APUs (Trinity and Richland) included the version 2 IOMMU functionality, but only for use by an external GPU connected via PCI Express.{{Citation needed|date=June 2016}}
 
Post-2015 Carrizo and Bristol Ridge APUs also include the version 2 IOMMU functionality for the integrated GPU.{{Citation needed|date=June 2016}}
Line 124 ⟶ 121:
 
===ARM===
ARM's [[Bifrost (microarchitecture)|Bifrost]] microarchitecture, as implemented in the Mali-G71,<ref>{{cite web |url=http://www.anandtech.com/show/10375/arm-unveils-bifrost-and-mali-g71/5 |archive-url=https://archive.today/20160910101608/http://www.anandtech.com/show/10375/arm-unveils-bifrost-and-mali-g71/5 |url-status=dead |archive-date=10 September 2016 |title=ARM Bifrost GPU Architecture |date=2016-05-30}}</ref> is fully compliant with the HSA 1.1 hardware specifications. {{As of|2016|6}}, ARM has not announced software support that would use this hardware feature.
 
==See also==
Line 132 ⟶ 129:
* [[Shared memory]]
* [[Zero-copy]]
* A technique enabling zero-copy operation for a CPU and a parallel accelerator <ref> Computer memory architecture for hybrid serial and parallel computing systems, US patents 7,707,388, 2010 and 8,145,879, 2012. Inventor: [[Uzi Vishkin]] </ref>
 
==References==
Line 139 ⟶ 137:
{{Commons category}}
* {{YouTube|id=ln8JpfaLvbM|title=HSA Heterogeneous System Architecture Overview}} by Vinod Tipparaju at [[ACM/IEEE Supercomputing Conference|SC13]] in November 2013
* [https://web.archive.org/web/20160514070602/http://www.mpsoc-forum.org/previous/2013/slides/8-Hegde.pdf HSA and the software ecosystem]
* [http://www-conf.slac.stanford.edu/xldb2012/talks/xldb2012_wed_1400_MichaelHouston.pdf 2012 – HSA by Michael Houston] {{Webarchive|url=https://web.archive.org/web/20160305141652/http://www-conf.slac.stanford.edu/xldb2012/talks/xldb2012_wed_1400_MichaelHouston.pdf |date=5 March 2016 }}
{{Use dmy dates|date=July 2019}}
 
[[Category:Heterogeneous System Architecture| ]]