Basic Linear Algebra Subprograms: Difference between revisions

Content deleted Content added
Revert changes by 64.121.97.220 on 17:45, 23 December 2022, which degraded the explanation of the basic nature of BLAS. I also improved the (original) explanation a little.
Rescuing 1 sources and tagging 0 as dead.) #IABot (v2.0.9.5
 
(4 intermediate revisions by 4 users not shown)
Line 98:
; [[BLIS (software)|BLIS]]: BLAS-like Library Instantiation Software framework for rapid instantiation. Optimized for most modern CPUs. BLIS is a complete refactoring of the GotoBLAS that reduces the amount of code that must be written for a given platform.<ref>{{Citation|title=blis: BLAS-like Library Instantiation Software Framework|date=2017-06-30|url=https://github.com/flame/blis|publisher=flame|access-date=2017-07-07}}</ref><ref>{{Citation|title=BLIS GitHub Repository|date=15 October 2021|url=https://github.com/flame/blis}}</ref>
; C++ AMP BLAS: The [[C++ AMP]] BLAS Library is an [[Open-source software|open source]] implementation of BLAS for Microsoft's AMP language extension for Visual C++.<ref>{{Cite web|url=http://ampblas.codeplex.com/|title=C++ AMP BLAS Library|website=CodePlex|language=en|access-date=2017-07-07|archive-date=2017-07-08 |archive-url=https://web.archive.org/web/20170708151515/http://ampblas.codeplex.com/|url-status=dead}}</ref>
; cuBLAS: Optimized BLAS for NVIDIA Nvidia-based GPU cards, requiring few additional library calls.<ref>{{Cite news|url=https://developer.nvidia.com/cublas|title=cuBLAS|date=2013-07-29|work=NVIDIA Developer|access-date=2017-07-07|language=en}}</ref>
; NVBLAS: Optimized BLAS for NVIDIA Nvidia-based GPU cards, providing only Level 3 functions, but as direct drop-in replacement for other BLAS libraries.<ref>{{Cite news|url=https://docs.nvidia.com/cuda/nvblas/index.htmls|title=NVBLAS|date=2018-05-15|work=NVIDIA Developer|access-date=2018-05-15|language=en}}{{Dead link|date=September 2023 |bot=InternetArchiveBot |fix-attempted=yes }}</ref>
; clBLAS: An [[OpenCL]] implementation of BLAS by AMD. Part of the AMD Compute Libraries.<ref name="github.com">{{Citation|title=clBLAS: a software library containing BLAS functions written in OpenCL|date=2017-07-03|url=https://github.com/clMathLibraries/clBLAS|publisher=clMathLibraries|access-date=2017-07-07}}</ref>
; clBLAST: A tuned [[OpenCL]] implementation of most of the BLAS api.<ref name="https://github.com/CNugteren/CLBlast">{{Citation|last=Nugteren|first=Cedric|title=CLBlast: Tuned OpenCL BLAS|date=2017-07-05|url=https://github.com/CNugteren/CLBlast|access-date=2017-07-07}}</ref>
Line 113:
; [[OpenBLAS]]: Optimized BLAS based on GotoBLAS, supporting [[x86]], [[x86-64]], [[MIPS architecture|MIPS]] and [[ARM architecture|ARM]] processors.<ref>{{Cite web|url=https://www.openblas.net/|title=OpenBLAS : An optimized BLAS library|website=www.openblas.net|access-date=2017-07-07}}</ref>
; PDLIB/SX: [[NEC Corporation|NEC]]'s Public Domain Mathematical Library for the NEC [[NEC SX architecture|SX-4]] system.<ref name=":0">{{cite web |url=http://www.nec.co.jp/hpc/mediator/sxm_e/software/61.html |title=PDLIB/SX: Business Solution &#124; NEC |access-date=2007-05-20 |url-status=dead |archive-url=https://web.archive.org/web/20070222154031/http://www.nec.co.jp/hpc/mediator/sxm_e/software/61.html |archive-date=2007-02-22 }}</ref>
; rocBLAS: Implementation that runs on [[AMD]] GPUs via [[ROCm]].<ref>{{Cite web|url=https://rocmdocs.amd.com/en/latest/ROCm_Tools/rocblas.html|title=rocBLAS|website=rocmdocs.amd.com|access-date=2021-05-21|archive-date=2021-05-22 |archive-url=https://web.archive.org/web/20210522003949/https://rocmdocs.amd.com/en/latest/ROCm_Tools/rocblas.html|url-status=dead}}</ref>
;SCSL
: [[Silicon Graphics|SGI]]'s Scientific Computing Software Library contains BLAS and LAPACK implementations for SGI's [[Irix]] workstations.<ref>{{cite web |url=http://www.sgi.com/products/software/scsl.html |title=SGI - SCSL Scientific Library: Home Page |access-date=2007-05-20 |url-status=dead |archive-url=https://web.archive.org/web/20070513173030/http://www.sgi.com/products/software/scsl.html |archive-date=2007-05-13 }}</ref>
; Sun Performance Library: Optimized BLAS and LAPACK for [[SPARC]], [[Intel Core|Core]] and [[AMD64]] architectures under [[Oracle Solaris|Solaris]] 8, 9, and 10 as well as Linux.<ref>{{Cite web|url=https://www.oracle.com/technetwork/server-storage/solarisstudio/overview/index.html|title=Oracle Developer Studio|website=www.oracle.com|access-date=2017-07-07}}</ref>
; uBLAS: A generic [[C++]] template class library providing BLAS functionality. Part of the [[Boost library]]. It provides bindings to many hardware-accelerated libraries in a unifying notation. Moreover, uBLAS focuses on correctness of the algorithms using advanced C++ features.<ref>{{Cite web|url=https://www.boost.org/doc/libs/1_60_0/libs/numeric/ublas/doc/index.html|title=Boost Basic Linear Algebra - 1.60.0|website=www.boost.org|access-date=2017-07-07}}</ref>
 
Line 144:
The index <math>k</math> in square brackets indicates that the operation is performed for all matrices <math>k</math> in a stack. Often, this operation is implemented for a strided batched memory layout where all matrices follow concatenated in the arrays <math>A</math>, <math>B</math> and <math>C</math>.
 
Batched BLAS functions can be a versatile tool and allow e.g. a fast implementation of [[exponential integrators]] and [[Magnus integrators]] that handle long integration periods with many time steps.<ref name="herb21">{{cite journal |last1=Herb |first1=Konstantin |last2=Welter |first2=Pol |title=Parallel time integration using Batched BLAS (Basic Linear Algebra Subprograms) routines |journal=Computer Physics Communications |volume=270 |pages=108181 |date=2022 |doi=10.1016/j.cpc.2021.108181 |arxiv=2108.07126|bibcode=2022CoPhC.27008181H |s2cid=237091802 }}</ref> Here, the [[matrix exponentiation]], the computationally expensive part of the integration, can be implemented in parallel for all time-steps by using Batched BLAS functions.
 
==See also==
Line 155:
<ref name="Kazushige_2008">{{cite journal |author-last1=Goto |author-first1=Kazushige |author-link1=Kazushige Goto |author-last2=van de Geijn |author-first2=Robert A. |author-link2=Robert van de Geijn |title=Anatomy of High-Performance Matrix Multiplication |date=2008 |journal=[[ACM Transactions on Mathematical Software]] |issn=0098-3500 |volume=34 |issue=3 |pages=12:1–12:25 |doi=10.1145/1356052.1356053 |citeseerx=10.1.1.111.3873|s2cid=9359223 }} (25 pages) [https://www.cs.utexas.edu/~flame/web/FLAMEPublications.html#Goto]</ref>
<ref name="GotoBLAS2">{{cite web |title=GotoBLAS2 |author-first=Kent |author-last=Milfeld |publisher=[[Texas Advanced Computing Center]] |url=http://www.tacc.utexas.edu/tacc-software/gotoblas2 |access-date=2024-03-17 |url-status=dead |archive-url=https://web.archive.org/web/20200323172521/https://www.tacc.utexas.edu/research-development/tacc-software/gotoblas2 |archive-date=2020-03-23}}</ref>
<ref name="Geijn_2008">{{cite journal |author-last1=Goto |author-first1=Kazushige |author-link1=Kazushige Goto |author-last2=van de Geijn |author-first2=Robert A. |author-link2=Robert van de Geijn |title=High-performance implementation of the level-3 BLAS |journal=[[ACM Transactions on Mathematical Software]] |volume=35 |issue=1 |pages=1–14 |date=2008 |doi=10.1145/1377603.1377607 |s2cid=14722514 |archive-url=https://web.archive.org/web/20170706142000/ftp://ftp.cs.utexas.edu/pub/techreports/tr06-23.pdf |archive-date=2017-07-06 |url-status=dead |url=ftp://ftp.cs.utexas.edu/pub/techreports/tr06-23.pdf}}</ref>
}}