Basic Linear Algebra Subprograms: Difference between revisions

Browse history interactively

← Previous edit

Content deleted Content added

VisualWikitext

Revision as of 14:22, 29 October 2021 edit Rvdgeijn (talk \| contribs) 39 edits m corrected typo ← Previous edit		Latest revision as of 04:41, 14 August 2025 edit undo InternetArchiveBot (talk \| contribs) Bots, Pending changes reviewers 5,689,863 edits Rescuing 1 sources and tagging 0 as dead.) #IABot (v2.0.9.5
(37 intermediate revisions by 22 users not shown)
Line 13: \| released = <!-- {{Start date and age\|YYYY\|MM\|DD\|df=yes}} --> \| discontinued = \| latest release version = 3.811.0 \| latest release date = {{Start date and age\|~~2017~~2022\|11\|1211\|df=yes}} \| latest preview version = \| latest preview date = <!-- {{Start date and age\|YYYY\|MM\|DD\|df=yes}} --> Line 28: \| license = \| alexa = \| website = {{URL\|~~http~~https://www.netlib.org/blas/}} \| standard = \| AsOf = Line 35: '''Basic Linear Algebra Subprograms''' ('''BLAS''') is a [[specification (technical standard)\|specification]] that prescribes a set of low-level routines for performing common [[linear algebra]] operations such as [[vector space\|vector]] addition, [[scalar multiplication]], [[dot product]]s, linear combinations, and [[matrix multiplication]]. They are the ''[[de facto]]'' standard low-level routines for linear algebra libraries; the routines have bindings for both [[C (programming language)\|C]] ("CBLAS interface") and [[Fortran]] ("BLAS interface"). Although the BLAS specification is general, BLAS implementations are often optimized for speed on a particular machine, so using them can bring substantial performance benefits. BLAS implementations will take advantage of special floating point hardware such as vector registers or [[SIMD]] instructions. It originated as a Fortran library in 1979<ref name="lawson79">{{cite journal \|last1=Lawson \|first1=C. L. \|last2=Hanson \|first2=R. J. \|last3=Kincaid \|first3=D. \|last4=Krogh \|first4=F. T. \|title=Basic Linear Algebra Subprograms for FORTRAN usage \|journal=ACM Trans. Math. Softw. \|volume=5 \|issue=3 \|pages=308–323 \|date=1979 \|id=Algorithm 539 \|doi=10.1145/355841.355847 \|hdl=2060/19780018835\|s2cid=6585321 \|hdl-access=free }}</ref> and its interface was standardized by the BLAS Technical (BLAST) Forum, whose latest BLAS report can be found on the [[netlib]] website.<ref>{{Cite web \|url=~~http~~https://netlib.org/blas/blast-forum\|title=BLAS Technical Forum \|website=netlib.org \|access-date=2017-07-07}}</ref> This Fortran library is known as the ''[[reference implementation]]'' (sometimes confusingly referred to as ''the'' BLAS library) and is not optimized for speed but is in the [[public ___domain]].<ref>[http://www.lahey.com/docs/blaseman_lin62.pdf blaseman] {{webarchive \|url=https://web.archive.org/web/20161012014431/http://www.lahey.com/docs/blaseman_lin62.pdf \|date=2016-10-12}} ''"The products are the implementations of the public ___domain BLAS (Basic Linear Algebra Subprograms) and LAPACK (Linear Algebra PACKage), which have been developed by groups of people such as Prof. Jack Dongarra, University of Tennessee, USA and all published on the WWW (URL: ~~http~~https://www.netlib.org/)."''{{dead link\|date=October 2016 \|bot=InternetArchiveBot \|fix-attempted=yes }}</ref><ref>{{cite web \|url=~~http~~https://www.netlib.org/utk/people/JackDongarra/PAPERS/netlib-history6.pdf \|title=Netlib and NA-Net: building a scientific computing community \|author=Jack Dongarra \|author2=Gene Golub \|author3=Eric Grosse \|author4=Cleve Moler \|author5=Keith Moore \|quote=The Netlib software repository was created in 1984 to facilitate quick distribution of public ___domain software routines for use in scientific computation. \|publisher=netlib.org \|access-date=2016-02-13}}</ref> Most libraries that offer linear algebra routines conform to the BLAS interface, allowing library users to develop programs that are indifferent to the BLAS library being used. Most libraries that offer linear algebra routines conform to the BLAS interface, allowing library users to develop programs that are indifferent to the BLAS library being used. Examples of BLAS libraries include: [[OpenBLAS]], [[BLIS (software)\|BLIS (BLAS-like Library Instantiation Software)]], Arm Performance Libraries,<ref name="Arm Performance Libraries">{{cite web\|date=2020 \|title=Arm Performance Libraries \|publisher=[[Arm]] \|url=https://www.arm.com/products/development-tools/server-and-hpc/allinea-studio/performance-libraries \|access-date=2020-12-16}}</ref> [[Automatically Tuned Linear Algebra Software\|ATLAS]], and [[Intel Math Kernel Library]] (MKL). AMD maintains a fork of BLIS that is optimized for the [[Advanced Micro Devices\|AMD]] platform.<ref>{{Cite web\|url=https://developer.amd.com/amd-aocl/blas-library/\|title=BLAS Library}}</ref> ATLAS is a portable library that automatically optimizes itself for an arbitrary architecture. MKL is a freeware<ref name="MKLfree">{{cite web \|date=2015 \|title=No Cost Options for Intel Math Kernel Library (MKL), Support yourself, Royalty-Free \|publisher=[[Intel]] \|url=http://software.intel.com/articles/free_mkl \|access-date=31 August 2015}}</ref> and proprietary<ref name="MKLintel">{{cite web \|date=2015 \|title=Intel Math Kernel Library (Intel MKL) \|publisher=[[Intel]] \|url=http://software.intel.com/intel-mkl \|access-date=25 August 2015}}</ref> vendor library optimized for x86 and x86-64 with a performance emphasis on [[Intel]] processors.<ref name="optnotice">{{cite web \|year=2012 \|title=Optimization Notice \|publisher=[[Intel]] \|url=http://software.intel.com/articles/optimization-notice \|access-date=10 April 2013}}</ref> OpenBLAS is an open-source library that is hand-optimized for many of the popular architectures. The [[LINPACK benchmarks]] rely heavily on the BLAS routine <code>[[General Matrix Multiply\|gemm]]</code> for its performance measurements.▼ ▲~~Most~~Many ~~libraries~~BLAS ~~that~~libraries ~~offer~~have ~~linear~~been ~~algebra routines conform to the BLAS interface~~developed, ~~allowing~~targeting ~~library~~various ~~users~~different tohardware ~~develop~~platforms. ~~programs~~Examples ~~that~~includes ~~are~~[[CUDA#Programming_abilities\|cuBLAS]] ~~indifferent~~(NVIDIA toGPU, ~~the~~[[GPGPU]]), ~~BLAS~~[[ROCm#rocBLAS_/_hipBLAS\|rocBLAS]] ~~library~~(AMD ~~being~~GPU), ~~used~~and [[OpenBLAS]]. Examples of CPU-based BLAS ~~libraries~~library branches include: [[OpenBLAS]], [[BLIS (software)\|BLIS (BLAS-like Library Instantiation Software)]], Arm Performance Libraries,<ref name="Arm Performance Libraries">{{cite web\|date=2020 \|title=Arm Performance Libraries \|publisher=[[Arm]] \|url=https://www.arm.com/products/development-tools/server-and-hpc/allinea-studio/performance-libraries \|access-date=2020-12-16}}</ref> [[Automatically Tuned Linear Algebra Software\|ATLAS]], and [[Intel Math Kernel Library]] (~~MKL~~iMKL). AMD maintains a fork of BLIS that is optimized for the [[Advanced Micro Devices\|AMD]] platform.<ref>{{Cite web\|url=https://developer.amd.com/amd-aocl/blas-library/\|title=BLAS Library}}</ref> ATLAS is a portable library that automatically optimizes itself for an arbitrary architecture. ~~MKL~~iMKL is a freeware<ref name="MKLfree">{{cite web \|date=2015 \|title=No Cost Options for Intel Math Kernel Library (MKL), Support yourself, Royalty-Free \|publisher=[[Intel]] \|url=~~http~~https://software.intel.com/articles/free_mkl \|access-date=31 August 2015}}</ref> and proprietary<ref name="MKLintel">{{cite web \|date=2015 \|title=Intel Math Kernel Library (Intel MKL) \|publisher=[[Intel]] \|url=~~http~~https://software.intel.com/intel-mkl \|access-date=25 August 2015}}</ref> vendor library optimized for x86 and x86-64 with a performance emphasis on [[Intel]] processors.<ref name="optnotice">{{cite web \|year=2012 \|title=Optimization Notice \|publisher=[[Intel]] \|url=~~http~~https://software.intel.com/articles/optimization-notice \|access-date=10 April 2013}}</ref> OpenBLAS is an open-source library that is hand-optimized for many of the popular architectures. The [[LINPACK benchmarks]] rely heavily on the BLAS routine <code>[[General Matrix Multiply\|gemm]]</code> for its performance measurements. Many numerical software applications use BLAS-compatible libraries to do linear algebra computations, including [[LAPACK]], [[LINPACK]], [[Armadillo (C++ library)\|Armadillo]], [[GNU Octave]], [[Mathematica]],<ref>{{cite journal \|author=Douglas Quinney \|date=2003 \|title=So what's new in Mathematica 5.0? \|journal=MSOR Connections \|volume=3 \|number=4 \|publisher=The Higher Education Academy \|url=http://78.158.56.101/archive/msor/headocs/34mathematica5.pdf \|url-status=dead \|archive-url=https://web.archive.org/web/20131029204826/http://78.158.56.101/archive/msor/headocs/34mathematica5.pdf \|archive-date=2013-10-29 }}</ref> [[MATLAB]],<ref>{{cite web \|author=Cleve Moler \|date=2000 \|title=MATLAB Incorporates LAPACK \|publisher=[[MathWorks]] \|url=http://www.mathworks.com/company/newsletters/articles/matlab-incorporates-lapack.html \|access-date=26 October 2013}}</ref> [[NumPy]],<ref name="cise">{{cite journal \|title=The NumPy array: a structure for efficient numerical computation \|author=Stéfan van der Walt \|author2=S. Chris Colbert \|author3=Gaël Varoquaux \|name-list-style=amp \|date=2011 \|journal=Computing in Science and Engineering \|volume=13 \|issue=2 \|pages=22–30 \|arxiv=1102.1523\|bibcode=2011arXiv1102.1523V \|doi=10.1109/MCSE.2011.37\|s2cid=16907816 }}</ref> [[R (programming language)\|R]], and [[Julia (programming language)\|Julia]].▼ ▲Many numerical software applications use BLAS-compatible libraries to do linear algebra computations, including [[LAPACK]], [[LINPACK]], [[Armadillo (C++ library)\|Armadillo]], [[GNU Octave]], [[Mathematica]],<ref>{{cite journal \|author=Douglas Quinney \|date=2003 \|title=So what's new in Mathematica 5.0? \|journal=MSOR Connections \|volume=3 \|number=4 \|publisher=The Higher Education Academy \|url=http://78.158.56.101/archive/msor/headocs/34mathematica5.pdf \|url-status=dead \|archive-url=https://web.archive.org/web/20131029204826/http://78.158.56.101/archive/msor/headocs/34mathematica5.pdf \|archive-date=2013-10-29 }}</ref> [[MATLAB]],<ref>{{cite web \|author=Cleve Moler \|date=2000 \|title=MATLAB Incorporates LAPACK \|publisher=[[MathWorks]] \|url=~~http~~https://www.mathworks.com/company/newsletters/articles/matlab-incorporates-lapack.html \|access-date=26 October 2013}}</ref> [[NumPy]],<ref name="cise">{{cite journal \|title=The NumPy array: a structure for efficient numerical computation \|author=Stéfan van der Walt \|author2=S. Chris Colbert \|author3=Gaël Varoquaux \|name-list-style=amp \|date=2011 \|journal=Computing in Science and Engineering \|volume=13 \|issue=2 \|pages=22–30 \|arxiv=1102.1523\|bibcode=~~2011arXiv1102~~2011CSE.~~1523V~~...13b..22V \|doi=10.1109/MCSE.2011.37\|s2cid=16907816 }}</ref> [[R (programming language)\|R]], ~~and~~ [[Julia (programming language)\|Julia]] and Lisp-Stat. ==Background== Line 74 ⟶ 76: ===Level 3=== This level, formally published in 1990,<ref name="level3">{{Cite journal \|last1=Dongarra \|first1=Jack J. \|last2=Du Croz \|first2=Jeremy \|last3=Hammarling \|first3=Sven \|last4=Duff \|first4=Iain S. \|title=A set of level 3 basic linear algebra subprograms \|doi=10.1145/77626.79170 \|date=1990 \|journal=[[ACM Transactions on Mathematical Software]] \|issn=0098-3500 \|volume=16 \|issue=1 \|pages=1–17\|s2cid=52873593 \|doi-access=free }}</ref> contains ''matrix-matrix operations'', including a "general [[matrix multiplication]]" (<code>gemm</code>), of the form :<math>\boldsymbol{C} \leftarrow \alpha \boldsymbol{A} \boldsymbol{B} + \beta \boldsymbol{C},</math> Line 93 ⟶ 95: ; Accelerate: [[Apple Inc.\|Apple]]'s framework for [[macOS]] and [[IOS (Apple)\|iOS]], which includes tuned versions of [[BLAS]] and [[LAPACK]].<ref>{{Cite web\|url=https://developer.apple.com/library/mac/#releasenotes/Performance/RN-vecLib/\|title=Guides and Sample Code\|website=developer.apple.com\|access-date=2017-07-07}}</ref><ref>{{Cite web\|url=https://developer.apple.com/library/ios/#documentation/Accelerate/Reference/AccelerateFWRef/\|title=Guides and Sample Code\|website=developer.apple.com\|access-date=2017-07-07}}</ref> ; Arm Performance Libraries: [[Arm Performance Libraries]], supporting Arm 64-bit [[AArch64]]-based processors, available from [[Arm Ltd.\|Arm]].<ref name="Arm Performance Libraries"/> ; ATLAS: [[Automatically Tuned Linear Algebra Software]], an [[Open-source software\|open source]] implementation of BLAS [[application programming interface\|API]]s for [[C (programming language)\|C]] and [[Fortran\|Fortran 77]].<ref>{{Cite web\|url=~~http~~https://math-atlas.sourceforge.net/\|title=Automatically Tuned Linear Algebra Software (ATLAS)\|website=math-atlas.sourceforge.net\|access-date=2017-07-07}}</ref> ; [[BLIS (software)\|BLIS]]: BLAS-like Library Instantiation Software framework for rapid instantiation. Optimized for most modern CPUs. BLIS is a complete refactoring of the GotoBLAS that reduces the amount of code that must be written for a given platform. <ref>{{Citation\|title=blis: BLAS-like Library Instantiation Software Framework\|date=2017-06-30\|url=https://github.com/flame/blis\|publisher=flame\|access-date=2017-07-07}}</ref><ref>{{Citation\|title=BLIS GitHub Repository\|date=15 October 2021\|url=https://github.com/flame/blis}}</ref> ; C++ AMP BLAS: The [[C++ AMP]] BLAS Library is an [[Open-source software\|open source]] implementation of BLAS for Microsoft's AMP language extension for Visual C++.<ref>{{Cite web\|url=http://ampblas.codeplex.com/\|title=C++ AMP BLAS Library\|website=CodePlex\|language=en\|access-date=2017-07-07\|archive-date=2017-07-08 \|archive-url=https://web.archive.org/web/20170708151515/http://ampblas.codeplex.com/\|url-status=dead}}</ref> ; cuBLAS: Optimized BLAS for ~~NVIDIA~~ Nvidia-based GPU cards, requiring few additional library calls.<ref>{{Cite news\|url=~~http~~https://developer.nvidia.com/cublas\|title=cuBLAS\|date=2013-07-29\|work=NVIDIA Developer\|access-date=2017-07-07\|language=en}}</ref> ; NVBLAS: Optimized BLAS for ~~NVIDIA~~ Nvidia-based GPU cards, providing only Level 3 functions, but as direct drop-in replacement for other BLAS libraries.<ref>{{Cite news\|url=https://docs.nvidia.com/cuda/nvblas/index.htmls\|title=NVBLAS\|date=2018-05-15\|work=NVIDIA Developer\|access-date=2018-05-15\|language=en}}{{Dead link\|date=September 2023 \|bot=InternetArchiveBot \|fix-attempted=yes }}</ref> ; clBLAS: An [[OpenCL]] implementation of BLAS by AMD. Part of the AMD Compute Libraries.<ref name="github.com">{{Citation\|title=clBLAS: a software library containing BLAS functions written in OpenCL\|date=2017-07-03\|url=https://github.com/clMathLibraries/clBLAS\|publisher=clMathLibraries\|access-date=2017-07-07}}</ref> ; clBLAST: A tuned [[OpenCL]] implementation of most of the BLAS api.<ref name="https://github.com/CNugteren/CLBlast">{{Citation\|last=Nugteren\|first=Cedric\|title=CLBlast: Tuned OpenCL BLAS\|date=2017-07-05\|url=https://github.com/CNugteren/CLBlast\|access-date=2017-07-07}}</ref> Line 105 ⟶ 107: ;[[GNU Scientific Library]]: Multi-platform implementation of many numerical routines. Contains a CBLAS interface. ; HP MLIB: [[Hewlett-Packard\|HP]]'s Math library supporting [[IA-64]], [[PA-RISC]], [[x86]] and [[Opteron]] architecture under [[HP-UX]] and [[Linux]]. ; Intel MKL: The [[Intel]] [[Math Kernel Library]], supporting x86 32-bits and 64-bits, available free from [[Intel]].<ref name="MKLfree" /> Includes optimizations for Intel [[Pentium (brand)\|Pentium]], [[Intel Core\|Core]] and Intel [[Xeon]] CPUs and Intel [[Xeon Phi]]; support for [[Linux]], [[Microsoft Windows\|Windows]] and [[macOS]].<ref>{{Cite web\|url=~~http~~https://software.intel.com/en-us/intel-mkl/\|title=Intel Math Kernel Library (Intel MKL) {{!}} Intel Software\|website=software.intel.com\|language=en\|access-date=2017-07-07}}</ref> ; MathKeisan: [[NEC Corporation\|NEC]]'s math library, supporting [[NEC SX architecture]] under [[SUPER-UX]], and [[Itanium]] under [[Linux]]<ref>{{Cite web\|url=~~http~~https://www.mathkeisan.com/\|title=MathKeisan\|last=Mathkeisan\|first=NEC\|website=www.mathkeisan.com\|language=en\|access-date=2017-07-07}}</ref> ; Netlib BLAS: The official reference implementation on [[Netlib]], written in [[Fortran\|Fortran 77]].<ref>{{Cite web\|url=~~http~~https://www.netlib.org/blas/\|title=BLAS (Basic Linear Algebra Subprograms)\|website=www.netlib.org\|access-date=2017-07-07}}</ref> ; Netlib CBLAS: Reference [[C (programming language)\|C]] interface to the BLAS. It is also possible (and popular) to call the Fortran BLAS from C.<ref>{{Cite web\|url=~~http~~https://www.netlib.org/blas\|title=BLAS (Basic Linear Algebra Subprograms)\|website=www.netlib.org\|access-date=2017-07-07}}</ref> ; [[OpenBLAS]]: Optimized BLAS based on GotoBLAS, supporting [[x86]], [[x86-64]], [[MIPS architecture\|MIPS]] and [[ARM architecture\|ARM]] processors.<ref>{{Cite web\|url=~~http~~https://www.openblas.net/\|title=OpenBLAS : An optimized BLAS library\|website=www.openblas.net\|access-date=2017-07-07}}</ref> ; PDLIB/SX: [[NEC Corporation\|NEC]]'s Public Domain Mathematical Library for the NEC [[NEC SX architecture\|SX-4]] system.<ref name=":0">{{cite web \|url=http://www.nec.co.jp/hpc/mediator/sxm_e/software/61.html \|title=~~Archived~~PDLIB/SX: ~~copy~~Business Solution \| NEC \|access-date=2007-05-20 \|url-status=dead \|archive-url=https://web.archive.org/web/20070222154031/http://www.nec.co.jp/hpc/mediator/sxm_e/software/61.html \|archive-date=2007-02-22 }}</ref> ; rocBLAS: Implementation that runs on [[AMD]] GPUs via [[ROCm]].<ref>{{Cite web\|url=https://rocmdocs.amd.com/en/latest/ROCm_Tools/rocblas.html\|title=rocBLAS\|website=rocmdocs.amd.com\|access-date=2021-05-21\|archive-date=2021-05-22 \|archive-url=https://web.archive.org/web/20210522003949/https://rocmdocs.amd.com/en/latest/ROCm_Tools/rocblas.html\|url-status=dead}}</ref> ;SCSL : [[Silicon Graphics\|SGI]]'s Scientific Computing Software Library contains BLAS and LAPACK implementations for SGI's [[Irix]] workstations.<ref>{{cite web \|url=http://www.sgi.com/products/software/scsl.html \|title=~~Archived~~SGI ~~copy~~- SCSL Scientific Library: Home Page \|access-date=2007-05-20 \|url-status=dead \|archive-url=https://web.archive.org/web/20070513173030/http://www.sgi.com/products/software/scsl.html \|archive-date=2007-05-13 }}</ref> ; Sun Performance Library: Optimized BLAS and LAPACK for [[SPARC]], [[Intel Core\|Core]] and [[AMD64]] architectures under [[Oracle Solaris\|Solaris]] 8, 9, and 10 as well as Linux.<ref>{{Cite web\|url=~~http~~https://www.oracle.com/technetwork/server-storage/solarisstudio/overview/index.html\|title=Oracle Developer Studio\|website=www.oracle.com\|access-date=2017-07-07}}</ref> ; uBLAS: A generic [[C++]] template class library providing BLAS functionality. Part of the [[Boost library]]. It provides bindings to many hardware-accelerated libraries in a unifying notation. Moreover, uBLAS focuses on correctness of the algorithms using advanced C++ features.<ref>{{Cite web\|url=~~http~~https://www.boost.org/doc/libs/1_60_0/libs/numeric/ublas/doc/index.html\|title=Boost Basic Linear Algebra - 1.60.0\|website=www.boost.org\|access-date=2017-07-07}}</ref> === Libraries using BLAS === ; Armadillo: [[Armadillo (C++ library)\|Armadillo]] is a C++ linear algebra library aiming towards a good balance between speed and ease of use. It employs template classes, and has optional links to BLAS/ATLAS and LAPACK. It is sponsored by [[NICTA]] (in Australia) and is licensed under a free license.<ref>{{Cite web\|url=~~http~~https://arma.sourceforge.net/\|title=Armadillo: C++ linear algebra library\|website=arma.sourceforge.net\|access-date=2017-07-07}}</ref> ; [[LAPACK]]: LAPACK is a higher level Linear Algebra library built upon BLAS. Like BLAS, a reference implementation exists, but many alternatives like libFlame and MKL exist. ; Mir: An [[LLVM]]-accelerated generic numerical library for science and machine learning written in [[D (programming language)\|D]]. It provides generic linear algebra subprograms (GLAS). It can be built on a CBLAS implementation.<ref>{{Cite web\|url=https://github.com/libmir\|title= Dlang Numerical and System Libraries\|website= [[GitHub]]}}</ref> ==Similar libraries (not compatible with BLAS)== {{see also\|LAPACK#Similar projects}} ; Elemental: Elemental is an open source software for [[distributed memory\|distributed-memory]] dense and sparse-direct linear algebra and optimization.<ref>{{Cite web\|url=~~http~~https://libelemental.org/\|title=Elemental: distributed-memory dense and sparse-direct linear algebra and optimization — Elemental\|website=libelemental.org\|access-date=2017-07-07}}</ref> ; HASEM: is a C++ template library, being able to solve linear equations and to compute eigenvalues. It is licensed under BSD License.<ref>{{Cite web\|url=~~http~~https://sourceforge.net/projects/hasem/\|title=HASEM\|website=SourceForge\|date=17 August 2015 \|language=en\|access-date=2017-07-07}}</ref> ; LAMA: The Library for Accelerated Math Applications ([[Library for Accelerated Math Applications\|LAMA]]) is a C++ template library for writing numerical solvers targeting various kinds of hardware (e.g. [[GPU]]s through [[CUDA]] or [[OpenCL]]) on [[distributed memory]] systems, hiding the hardware specific programming from the program developer ; MTL4: The [[Matrix Template Library]] version 4 is a generic [[C++]] template library providing sparse and dense BLAS functionality. MTL4 establishes an intuitive interface (similar to [[MATLAB]]) and broad applicability thanks to [[generic programming]]. Line 132 ⟶ 134: ==Sparse BLAS== Several extensions to BLAS for handling [[Sparse matrix\|sparse matrices]] have been suggested over the course of the library's history; a small set of sparse matrix kernel routines was finally standardized in 2002.<ref>{{cite journal \|first1=Iain S. \|last1=Duff \|first2=Michael A. \|last2=Heroux \|first3=Roldan \|last3=Pozo \|title=An Overview of the Sparse Basic Linear Algebra Subprograms: The New Standard from the BLAS Technical Forum \|journal= ACM Transactions on Mathematical Software\|year=2002 \|volume=28 \|issue=2 \|pages=239–267 \|doi=10.1145/567806.567810\|s2cid=9411006 }}</ref> ==Batched BLAS== The traditional BLAS functions have been also ported to architectures that support large amounts of parallelism such as [[GPUs]]. Here, the traditional BLAS functions provide typically good performance for large matrices. However, when computing e.g., matrix-matrix-products of many small matrices by using the GEMM routine, those architectures show significant performance losses. To address this issue, in 2017 a batched version of the BLAS function has been specified.<ref name="dongarra17">{{cite journal \|last1=Dongarra \|first1=Jack \|last2=Hammarling \|first2=Sven \|last3=Higham \|first3=Nicholas J. \|last4=Relton \|first4=Samuel D. \|last5=Valero-Lara \|first5=Pedro \|last6=Zounon \|first6=Mawussi \|title=The Design and Performance of Batched BLAS on Modern High-Performance Computing Systems \|journal=Procedia Computer Science \|volume=108 \|pages=495–504 \|date=2017 \|doi=10.1016/j.procs.2017.05.138\|doi-access=free \|hdl=2117/106913 \|hdl-access=free }}</ref> Taking the GEMM routine from above as an example, the batched version performs the following computation simultaneously for many matrices: <math>\boldsymbol{C}[k] \leftarrow \alpha \boldsymbol{A}[k] \boldsymbol{B}[k] + \beta \boldsymbol{C}[k] \quad \forall k </math> The index <math>k</math> in square brackets indicates that the operation is performed for all matrices <math>k</math> in a stack. Often, this operation is implemented for a strided batched memory layout where all matrices follow concatenated in the arrays <math>A</math>, <math>B</math> and <math>C</math>. Batched BLAS functions can be a versatile tool and allow e.g. a fast implementation of [[exponential integrators]] and [[Magnus integrators]] that handle long integration periods with many time steps.<ref name="herb21">{{cite journal \|last1=Herb \|first1=Konstantin \|last2=Welter \|first2=Pol \|title=Parallel time integration using Batched BLAS (Basic Linear Algebra Subprograms) routines \|journal=Computer Physics Communications \|volume=270 \|pages=108181 \|date=2022 \|doi=10.1016/j.cpc.2021.108181 \|arxiv=2108.07126\|bibcode=2022CoPhC.27008181H \|s2cid=237091802 }}</ref> Here, the [[matrix exponentiation]], the computationally expensive part of the integration, can be implemented in parallel for all time-steps by using Batched BLAS functions. ==See also== Line 140 ⟶ 153: ==References== {{reflist\|refs= <ref name="Kazushige_2008">{{cite journal \|author-last1=Goto \|author-first1=Kazushige \|author-link1=Kazushige Goto \|author-last2=van de Geijn \|author-first2=Robert A. \|author-link2=Robert van de Geijn \|title=Anatomy of High-Performance Matrix Multiplication \|date=2008 \|journal=[[ACM Transactions on Mathematical Software]] \|issn=0098-3500 \|volume=34 \|issue=3 \|pages=12:1–12:25 \|doi=10.1145/1356052.1356053 \|citeseerx=10.1.1.111.3873\|s2cid=9359223 }} (25 pages) [~~http~~https://www.cs.utexas.edu/~flame/web/FLAMEPublications.html#Goto]</ref> <ref name="GotoBLAS2">{{cite web \|title=GotoBLAS2 \|author-first=Kent \|author-last=Milfeld \|publisher=[[Texas Advanced Computing Center]] \|url=http://www.tacc.utexas.edu/tacc-software/gotoblas2 \|access-date=~~2013~~2024-0803-2817 \|url-status=~~live~~dead \|archive-url=https://web.archive.org/web/20200323172521/https://www.tacc.utexas.edu/research-development/tacc-software/gotoblas2 \|archive-date=2020-03-23}}</ref> <ref name="Geijn_2008">{{cite journal \|author-last1=Goto \|author-first1=Kazushige \|author-link1=Kazushige Goto \|author-last2=van de Geijn \|author-first2=Robert A. \|author-link2=Robert van de Geijn \|title=High-performance implementation of the level-3 BLAS \|journal=[[ACM Transactions on Mathematical Software]] \|volume=35 \|issue=1 \|pages=1–14 \|date=2008 \|doi=10.1145/1377603.1377607 \|s2cid=14722514 \|archive-url=https://web.archive.org/web/20170706142000/ftp://ftp.cs.utexas.edu/pub/techreports/tr06-23.pdf \|archive-date=2017-07-06 \|url-status=dead \|url=ftp://ftp.cs.utexas.edu/pub/techreports/tr06-23.pdf}}</ref> }} Line 148 ⟶ 161: {{Citation \|author=BLAST Forum \|title=Basic Linear Algebra Subprograms Technical (BLAST) Forum Standard \|date=21 August 2001 \|publisher=University of Tennessee \|___location=Knoxville, TN }} * {{Citation \|last1=Dodson \|first1= D. S. \|last2=Grimes \|first2=R. G. \|title=Remark on algorithm 539: Basic Linear Algebra Subprograms for Fortran usage \|journal=ACM Trans. Math. Softw. \|volume=8 \|issue= 4 \|pages=403–404 \|year=1982 \|doi= 10.1145/356012.356020\|s2cid= 43081631 }} * {{Citation \|last=Dodson \|first=D. S. \|title=Corrigendum: Remark on "Algorithm 539: Basic Linear Algebra Subroutines for FORTRAN usage" \|journal=ACM Trans. Math. Softw. \|volume=9 \|page=140 \|year=1983 \|doi= 10.1145/356022.356032\|s2cid=22163977 \|doi-access=free }} * J. J. Dongarra, J. Du Croz, S. Hammarling, and R. J. Hanson, Algorithm 656: An extended set of FORTRAN Basic Linear Algebra Subprograms, ACM Trans. Math. Softw., 14 (1988), pp. 18–32. * J. J. Dongarra, J. Du Croz, I. S. Duff, and S. Hammarling, A set of Level 3 Basic Linear Algebra Subprograms, ACM Trans. Math. Softw., 16 (1990), pp. 1–17. Line 157 ⟶ 170: ==External links== * [~~http~~https://www.netlib.org/blas/ BLAS homepage] on Netlib.org * [~~http~~https://www.netlib.org/blas/faq.html BLAS FAQ] * [~~http~~https://www.netlib.org/lapack/lug/node145.html BLAS Quick Reference Guide] from LAPACK Users' Guide * [https://web.archive.org/web/20061009230911/http://history.siam.org/oralhistories/lawson.htm Lawson Oral History] One of the original authors of the BLAS discusses its creation in an oral history interview. Charles L. Lawson Oral history interview by Thomas Haigh, 6 and 7 November 2004, San Clemente, California. Society for Industrial and Applied Mathematics, Philadelphia, PA. * [https://web.archive.org/web/20061009230904/http://history.siam.org/oralhistories/dongarra.htm Dongarra Oral History] In an oral history interview, Jack Dongarra explores the early relationship of BLAS to LINPACK, the creation of higher level BLAS versions for new architectures, and his later work on the ATLAS system to automatically optimize BLAS for particular machines. Jack Dongarra, Oral history interview by Thomas Haigh, 26 April 2005, University of Tennessee, Knoxville TN. Society for Industrial and Applied Mathematics, Philadelphia, PA