Content deleted Content added
Ashokkelsa (talk | contribs) m →Implementations: Removed link to a page that does not exist. |
Citation bot (talk | contribs) Alter: bibcode. Add: s2cid, bibcode. | Use this bot. Report bugs. | Suggested by Abductive | #UCB_webform 3390/3850 |
||
Line 39:
Most libraries that offer linear algebra routines conform to the BLAS interface, allowing library users to develop programs that are indifferent to the BLAS library being used. BLAS implementations have known a spectacular explosion in uses with the development of [[GPGPU]], with [[CUDA#Programming abilities|cuBLAS]] and [[ROCm#rocBLAS / hipBLAS|rocBLAS]] being prime examples. CPU-based examples of BLAS libraries include: [[OpenBLAS]], [[BLIS (software)|BLIS (BLAS-like Library Instantiation Software)]], Arm Performance Libraries,<ref name="Arm Performance Libraries">{{cite web|date=2020 |title=Arm Performance Libraries |publisher=[[Arm (company)|Arm]] |url=https://www.arm.com/products/development-tools/server-and-hpc/allinea-studio/performance-libraries |access-date=2020-12-16}}</ref> [[Automatically Tuned Linear Algebra Software|ATLAS]], and [[Intel Math Kernel Library]] (MKL). AMD maintains a fork of BLIS that is optimized for the [[Advanced Micro Devices|AMD]] platform.<ref>{{Cite web|url=https://developer.amd.com/amd-aocl/blas-library/|title=BLAS Library}}</ref> ATLAS is a portable library that automatically optimizes itself for an arbitrary architecture. MKL is a freeware<ref name="MKLfree">{{cite web |date=2015 |title=No Cost Options for Intel Math Kernel Library (MKL), Support yourself, Royalty-Free |publisher=[[Intel]] |url=http://software.intel.com/articles/free_mkl |access-date=31 August 2015}}</ref> and proprietary<ref name="MKLintel">{{cite web |date=2015 |title=Intel Math Kernel Library (Intel MKL) |publisher=[[Intel]] |url=http://software.intel.com/intel-mkl |access-date=25 August 2015}}</ref> vendor library optimized for x86 and x86-64 with a performance emphasis on [[Intel]] processors.<ref name="optnotice">{{cite web |year=2012 |title=Optimization Notice |publisher=[[Intel]] |url=http://software.intel.com/articles/optimization-notice |access-date=10 April 2013}}</ref> OpenBLAS is an open-source library that is hand-optimized for many of the popular architectures. The [[LINPACK benchmarks]] rely heavily on the BLAS routine <code>[[General Matrix Multiply|gemm]]</code> for its performance measurements.
Many numerical software applications use BLAS-compatible libraries to do linear algebra computations, including [[LAPACK]], [[LINPACK]], [[Armadillo (C++ library)|Armadillo]], [[GNU Octave]], [[Mathematica]],<ref>{{cite journal |author=Douglas Quinney |date=2003 |title=So what's new in Mathematica 5.0? |journal=MSOR Connections |volume=3 |number=4 |publisher=The Higher Education Academy |url=http://78.158.56.101/archive/msor/headocs/34mathematica5.pdf |url-status=dead |archive-url=https://web.archive.org/web/20131029204826/http://78.158.56.101/archive/msor/headocs/34mathematica5.pdf |archive-date=2013-10-29 }}</ref> [[MATLAB]],<ref>{{cite web |author=Cleve Moler |date=2000 |title=MATLAB Incorporates LAPACK |publisher=[[MathWorks]] |url=http://www.mathworks.com/company/newsletters/articles/matlab-incorporates-lapack.html |access-date=26 October 2013}}</ref> [[NumPy]],<ref name="cise">{{cite journal |title=The NumPy array: a structure for efficient numerical computation |author=Stéfan van der Walt |author2=S. Chris Colbert |author3=Gaël Varoquaux |name-list-style=amp |date=2011 |journal=Computing in Science and Engineering |volume=13 |issue=2 |pages=22–30 |arxiv=1102.1523|bibcode=
==Background==
Line 142:
The index <math>k</math> in square brackets indicates that the operation is performed for all matrices <math>k</math> in a stack. Often, this operation is implemented for a strided batched memory layout where all matrices follow concatenated in the arrays <math>A</math>, <math>B</math> and <math>C</math>.
Batched BLAS functions can be a versatile tool and allow e.g. a fast implementation of [[exponential integrators]] and [[Magnus integrators]] that handle long integration periods with many time steps.<ref name="herb21">{{cite journal |last1=Herb |first1=Konstantin |last2=Welter |first2=Pol |title=Parallel time integration using Batched BLAS (Basic Linear Algebra Subprograms) routines |journal=Computer Physics Communications |volume=270 |pages=108181 |date=2022 |doi=10.1016/j.cpc.2021.108181 |arxiv=2108.07126|bibcode=2022CoPhC.27008181H |s2cid=237091802 }}</ref> Here, the [[matrix exponentiation]], the computationally expensive part of the integration, can be implemented in parallel for all time-steps by using Batched BLAS functions.
==See also==
|