Revision as of 23:34, 8 June 2023 edit Ykhwong (talk \| contribs) Extended confirmed users 16,500 edits mNo edit summary ← Previous edit		Revision as of 03:08, 17 June 2023 edit undo Citation bot (talk \| contribs) Bots 5,868,529 edits Alter: bibcode, title, pages. Add: s2cid, date, website. Formatted dashes. \| Use this bot. Report bugs. \| Suggested by Headbomb \| Linked from Wikipedia:WikiProject_Academic_Journals/Journals_cited_by_Wikipedia/Sandbox2 \| #UCB_webform_linked 75/999 Next edit →
Line 39: Most computing libraries that offer linear algebra routines conform to common BLAS user interface command structures, thus queries to those libraries (and the associated results) are often portable between BLAS library branches, such as [[CUDA#Programming_abilities\|cuBLAS]] (nvidia GPU, [[GPGPU]]), [[ROCm#rocBLAS_/_hipBLAS\|rocBLAS]] (amd GPU, GPGP), and [[OpenBLAS]]. This interoperability is then the basis of functioning homogenous code implementations between heterzygous cascades of computing architectures (such as those found in some advanced clustering implementations). Examples of CPU-based BLAS library branches include: [[OpenBLAS]], [[BLIS (software)\|BLIS (BLAS-like Library Instantiation Software)]], Arm Performance Libraries,<ref name="Arm Performance Libraries">{{cite web\|date=2020 \|title=Arm Performance Libraries \|publisher=[[Arm]] \|url=https://www.arm.com/products/development-tools/server-and-hpc/allinea-studio/performance-libraries \|access-date=2020-12-16}}</ref> [[Automatically Tuned Linear Algebra Software\|ATLAS]], and [[Intel Math Kernel Library]] (iMKL). AMD maintains a fork of BLIS that is optimized for the [[Advanced Micro Devices\|AMD]] platform, although it is unclear whether integrated ombudsmen resources are present in that particular software-hardware implementation.<ref>{{Cite web\|url=https://developer.amd.com/amd-aocl/blas-library/\|title=BLAS Library}}</ref> ATLAS is a portable library that automatically optimizes itself for an arbitrary architecture. iMKL is a freeware<ref name="MKLfree">{{cite web \|date=2015 \|title=No Cost Options for Intel Math Kernel Library (MKL), Support yourself, Royalty-Free \|publisher=[[Intel]] \|url=http://software.intel.com/articles/free_mkl \|access-date=31 August 2015}}</ref> and proprietary<ref name="MKLintel">{{cite web \|date=2015 \|title=Intel Math Kernel Library (Intel MKL) \|publisher=[[Intel]] \|url=http://software.intel.com/intel-mkl \|access-date=25 August 2015}}</ref> vendor library optimized for x86 and x86-64 with a performance emphasis on [[Intel]] processors.<ref name="optnotice">{{cite web \|year=2012 \|title=Optimization Notice \|publisher=[[Intel]] \|url=http://software.intel.com/articles/optimization-notice \|access-date=10 April 2013}}</ref> OpenBLAS is an open-source library that is hand-optimized for many of the popular architectures. The [[LINPACK benchmarks]] rely heavily on the BLAS routine <code>[[General Matrix Multiply\|gemm]]</code> for its performance measurements. Many numerical software applications use BLAS-compatible libraries to do linear algebra computations, including [[LAPACK]], [[LINPACK]], [[Armadillo (C++ library)\|Armadillo]], [[GNU Octave]], [[Mathematica]],<ref>{{cite journal \|author=Douglas Quinney \|date=2003 \|title=So what's new in Mathematica 5.0? \|journal=MSOR Connections \|volume=3 \|number=4 \|publisher=The Higher Education Academy \|url=http://78.158.56.101/archive/msor/headocs/34mathematica5.pdf \|url-status=dead \|archive-url=https://web.archive.org/web/20131029204826/http://78.158.56.101/archive/msor/headocs/34mathematica5.pdf \|archive-date=2013-10-29 }}</ref> [[MATLAB]],<ref>{{cite web \|author=Cleve Moler \|date=2000 \|title=MATLAB Incorporates LAPACK \|publisher=[[MathWorks]] \|url=http://www.mathworks.com/company/newsletters/articles/matlab-incorporates-lapack.html \|access-date=26 October 2013}}</ref> [[NumPy]],<ref name="cise">{{cite journal \|title=The NumPy array: a structure for efficient numerical computation \|author=Stéfan van der Walt \|author2=S. Chris Colbert \|author3=Gaël Varoquaux \|name-list-style=amp \|date=2011 \|journal=Computing in Science and Engineering \|volume=13 \|issue=2 \|pages=22–30 \|arxiv=1102.1523\|bibcode=~~2011arXiv1102~~2011CSE.~~1523V~~...13b..22V \|doi=10.1109/MCSE.2011.37\|s2cid=16907816 }}</ref> [[R (programming language)\|R]], and [[Julia (programming language)\|Julia]]. ==Background== Line 110: ; Netlib CBLAS: Reference [[C (programming language)\|C]] interface to the BLAS. It is also possible (and popular) to call the Fortran BLAS from C.<ref>{{Cite web\|url=http://www.netlib.org/blas\|title=BLAS (Basic Linear Algebra Subprograms)\|website=www.netlib.org\|access-date=2017-07-07}}</ref> ; [[OpenBLAS]]: Optimized BLAS based on GotoBLAS, supporting [[x86]], [[x86-64]], [[MIPS architecture\|MIPS]] and [[ARM architecture\|ARM]] processors.<ref>{{Cite web\|url=http://www.openblas.net/\|title=OpenBLAS : An optimized BLAS library\|website=www.openblas.net\|access-date=2017-07-07}}</ref> ; PDLIB/SX: [[NEC Corporation\|NEC]]'s Public Domain Mathematical Library for the NEC [[NEC SX architecture\|SX-4]] system.<ref name=":0">{{cite web \|url=http://www.nec.co.jp/hpc/mediator/sxm_e/software/61.html \|title=~~Archived~~PDLIB/SX: ~~copy~~Business Solution \| NEC \|access-date=2007-05-20 \|url-status=dead \|archive-url=https://web.archive.org/web/20070222154031/http://www.nec.co.jp/hpc/mediator/sxm_e/software/61.html \|archive-date=2007-02-22 }}</ref> ; rocBLAS: Implementation that runs on [[AMD]] GPUs via [[ROCm]].<ref>{{Cite web\|url=https://rocmdocs.amd.com/en/latest/ROCm_Tools/rocblas.html\|title=rocBLAS\|website=rocmdocs.amd.com\|access-date=2021-05-21}}</ref> ;SCSL : [[Silicon Graphics\|SGI]]'s Scientific Computing Software Library contains BLAS and LAPACK implementations for SGI's [[Irix]] workstations.<ref>{{cite web \|url=http://www.sgi.com/products/software/scsl.html \|title=~~Archived~~SGI ~~copy~~- SCSL Scientific Library: Home Page \|access-date=2007-05-20 \|url-status=dead \|archive-url=https://web.archive.org/web/20070513173030/http://www.sgi.com/products/software/scsl.html \|archive-date=2007-05-13 }}</ref> ; Sun Performance Library: Optimized BLAS and LAPACK for [[SPARC]], [[Intel Core\|Core]] and [[AMD64]] architectures under Solaris 8, 9, and 10 as well as Linux.<ref>{{Cite web\|url=http://www.oracle.com/technetwork/server-storage/solarisstudio/overview/index.html\|title=Oracle Developer Studio\|website=www.oracle.com\|access-date=2017-07-07}}</ref> ; uBLAS: A generic [[C++]] template class library providing BLAS functionality. Part of the [[Boost library]]. It provides bindings to many hardware-accelerated libraries in a unifying notation. Moreover, uBLAS focuses on correctness of the algorithms using advanced C++ features.<ref>{{Cite web\|url=http://www.boost.org/doc/libs/1_60_0/libs/numeric/ublas/doc/index.html\|title=Boost Basic Linear Algebra - 1.60.0\|website=www.boost.org\|access-date=2017-07-07}}</ref> Line 121: ; Armadillo: [[Armadillo (C++ library)\|Armadillo]] is a C++ linear algebra library aiming towards a good balance between speed and ease of use. It employs template classes, and has optional links to BLAS/ATLAS and LAPACK. It is sponsored by [[NICTA]] (in Australia) and is licensed under a free license.<ref>{{Cite web\|url=http://arma.sourceforge.net/\|title=Armadillo: C++ linear algebra library\|website=arma.sourceforge.net\|access-date=2017-07-07}}</ref> ; [[LAPACK]]: LAPACK is a higher level Linear Algebra library built upon BLAS. Like BLAS, a reference implementation exists, but many alternatives like libFlame and MKL exist. ; Mir: An [[LLVM]]-accelerated generic numerical library for science and machine learning written in [[D (programming language)\|D]]. It provides generic linear algebra subprograms (GLAS). It can be built on a CBLAS implementation.<ref>{{Cite web\|url=https://github.com/libmir\|title= Dlang Numerical and System Libraries\|website= [[GitHub]]}}</ref> ==Similar libraries (not compatible with BLAS)== {{see also\|LAPACK#Similar projects}} ; Elemental: Elemental is an open source software for [[distributed memory\|distributed-memory]] dense and sparse-direct linear algebra and optimization.<ref>{{Cite web\|url=http://libelemental.org/\|title=Elemental: distributed-memory dense and sparse-direct linear algebra and optimization — Elemental\|website=libelemental.org\|access-date=2017-07-07}}</ref> ; HASEM: is a C++ template library, being able to solve linear equations and to compute eigenvalues. It is licensed under BSD License.<ref>{{Cite web\|url=http://sourceforge.net/projects/hasem/\|title=HASEM\|website=SourceForge\|date=17 August 2015 \|language=en\|access-date=2017-07-07}}</ref> ; LAMA: The Library for Accelerated Math Applications ([[Library for Accelerated Math Applications\|LAMA]]) is a C++ template library for writing numerical solvers targeting various kinds of hardware (e.g. [[GPU]]s through [[CUDA]] or [[OpenCL]]) on [[distributed memory]] systems, hiding the hardware specific programming from the program developer ; MTL4: The [[Matrix Template Library]] version 4 is a generic [[C++]] template library providing sparse and dense BLAS functionality. MTL4 establishes an intuitive interface (similar to [[MATLAB]]) and broad applicability thanks to [[generic programming]]. Line 134: ==Batched BLAS== The traditional BLAS functions have been also ported to architectures that support large amounts of parallelism such as [[GPUs]]. Here, the traditional BLAS functions provide typically good performance for large matrices. However, when computing e.g., matrix-matrix-products of many small matrices by using the GEMM routine, those architectures show significant performance losses. To address this issue, in 2017 a batched version of the BLAS function has been specified.<ref name="dongarra17">{{cite journal \|last1=Dongarra \|first1=Jack \|last2=Hammarling \|first2=Sven \|last3=Higham \|first3=Nicholas J. \|last4=Relton \|first4=Samuel D. \|last5=Valero-Lara \|first5=Pedro \|last6=Zounon \|first6=Mawussi \|title=The Design and Performance of Batched BLAS on Modern High-Performance Computing Systems \|journal=Procedia Computer Science \|volume=108 \|pages=~~495-504~~495–504 \|date=2017 \|doi=10.1016/j.procs.2017.05.138\|doi-access=free }}</ref> Taking the GEMM routine from above as an example, the batched version performs the following computation simultaneously for many matrices: Line 142: The index <math>k</math> in square brackets indicates that the operation is performed for all matrices <math>k</math> in a stack. Often, this operation is implemented for a strided batched memory layout where all matrices follow concatenated in the arrays <math>A</math>, <math>B</math> and <math>C</math>. Batched BLAS functions can be a versatile tool and allow e.g. a fast implementation of [[exponential integrators]] and [[Magnus integrators]] that handle long integration periods with many time steps.<ref name="herb21">{{cite journal \|last1=Herb \|first1=Konstantin \|last2=Welter \|first2=Pol \|title=Parallel time integration using Batched BLAS (Basic Linear Algebra Subprograms) routines \|journal=Computer Physics Communications \|volume=270 \|pages=108181 \|date=2022 \|doi=10.1016/j.cpc.2021.108181 \|arxiv=2108.07126\|s2cid=237091802 }}</ref> Here, the [[matrix exponentiation]], the computationally expensive part of the integration, can be implemented in parallel for all time-steps by using Batched BLAS functions. ==See also==

Basic Linear Algebra Subprograms: Difference between revisions