Message Passing Interface: Difference between revisions

Content deleted Content added
m Removed "standardized" from "standardized standard"
Bender the Bot (talk | contribs)
m Python: HTTP to HTTPS for SourceForge
 
(16 intermediate revisions by 11 users not shown)
Line 15:
MPI provides parallel hardware vendors with a clearly defined base set of routines that can be efficiently implemented. As a result, hardware vendors can build upon this collection of standard [[Low-level programming language|low-level]] routines to create [[High-level programming language|higher-level]] routines for the distributed-memory communication environment supplied with their [[parallel machine]]s. MPI provides a simple-to-use portable interface for the basic user, yet one powerful enough to allow programmers to use the high-performance message passing operations available on advanced machines.
 
In an effort to create a universal standard for message passing, researchers did not base it off of a single system but it incorporated the most useful features of several systems, including those designed by IBM, [[Intel]], [[nCUBE]], [[Parallel Virtual Machine|PVM]], Express, P4 and PARMACS. The message-passing paradigm is attractive because of wide portability and can be used in communication for distributed-memory and shared-memory multiprocessors, networks of workstations, and a combination of these elements. The paradigm can apply in multiple settings, independent of network speed or memory architecture.
 
Support for MPI meetings came in part from [[DARPA]] and from the U.S. [[National Science Foundation]] (NSF) under grant ASC-9310330, NSF Science and Technology Center Cooperative agreement number CCR-8809615, and from the [[European Commission]] through Esprit Project P6643. The [[University of Tennessee]] also made financial contributions to the MPI Forum.
Line 24:
MPI is a [[communication protocol]] for programming<ref>{{cite book |first=Frank |last=Nielsen | title=Introduction to HPC with MPI for Data Science | year=2016 | publisher=Springer |isbn=978-3-319-21903-5 |pages=195–211
|chapter=2. Introduction to MPI: The MessagePassing Interface | url=https://franknielsen.github.io/HPC4DS/index.html
|chapter-url=https://www.researchgate.net/publication/314626214 }}</ref> [[parallel computers]]. Both point-to-point and collective communication are supported. MPI "is a message-passing application programmer interface, together with protocol and semantic specifications for how its features must behave in any implementation."<ref>{{harvnb |Gropp |Lusk |Skjellum |1996 |p=3 }}</ref> MPI's goals are high performance, scalability, and portability. MPI remains the dominant model used in [[high-performance computing]] todayas of 2006.<ref>{{cite book|pages = 105|first1=Sayantan|last1=Sur|first2=Matthew J.|last2=Koop|first3=Dhabaleswar K.|last3=Panda| title=Proceedings of the 2006 ACM/IEEE conference on Supercomputing - SC '06 | chapter=MPI and communication---High-performance and scalable MPI over InfiniBand with reduced memory usage: An in-depth performance analysis |date=411 AugustNovember 20172006|publisher=ACM|doi=10.1145/1188455.1188565|isbn = 978-0769527000|s2cid = 818662}}</ref>
 
MPI is not sanctioned by any major standards body; nevertheless, it has become a [[de facto standard|''de facto'' standard]] for [[communication]] among processes that model a [[parallel programming|parallel program]] running on a [[distributed memory]] system. Actual distributed memory supercomputers such as computer clusters often run such programs.
Line 37:
 
{{Anchor|VERSIONS}}
At present, the standard has several versions: version 1.3 (commonly abbreviated ''MPI-1''), which emphasizes message passing and has a static runtime environment, MPI-2.2 (MPI-2), which includes new features such as parallel I/O, dynamic process management and remote memory operations,<ref name="Gropp99adv-pp4-5">{{harvnb|Gropp|Lusk|Skjellum|1999b|pp=4–5}}</ref> and MPI-3.1 (MPI-3), which includes extensions to the collective operations with non-blocking versions and extensions to the one-sided operations.<ref name="MPI_3.1">[http://www.mpi-forum.org/docs/mpi-3.1/mpi31-report.pdf MPI: A Message-Passing Interface Standard<br />Version 3.1, Message Passing Interface Forum, June 4, 2015]. http://www.mpi-forum.org. Retrieved on 2015-06-16.</ref>
MPI-2's LIS specifies over 500 functions and provides language bindings for ISO [[C (programming language)|C]], ISO [[C++]], and [[Fortran 90]]. Object interoperability was also added to allow easier mixed-language message passing programming. A side-effect of standardizing MPI-2, completed in 1996, was clarifying the MPI-1 standard, creating the MPI-1.2.
 
Line 56:
The MPI interface is meant to provide essential virtual topology, [[synchronization]], and communication functionality between a set of processes (that have been mapped to nodes/servers/computer instances) in a language-independent way, with language-specific syntax (bindings), plus a few language-specific features. MPI programs always work with processes, but programmers commonly refer to the processes as processors. Typically, for maximum performance, each [[CPU]] (or [[multi-core (computing)|core]] in a multi-core machine) will be assigned just a single process. This assignment happens at runtime through the agent that starts the MPI program, normally called mpirun or mpiexec.
 
MPI library functions include, but are not limited to, point-to-point rendezvous-type send/receive operations, choosing between a [[Cartesian tree|Cartesian]] or [[Graph (data structure)|graph]]-like logical process topology, exchanging data between process pairs (send/receive operations), combining partial results of computations (gather and reduce operations), synchronizing nodes (barrier operation) as well as obtaining network-related information such as the number of processes in the computing session, current processor identity that a process is mapped to, neighboring processes accessible in a logical topology, and so on. Point-to-point operations come in [[synchronization (computer science)|synchronous]], [[asynchronous i/o|asynchronous]], buffered, and ''ready'' forms, to allow both relatively stronger and weaker [[semantics]] for the synchronization aspects of a rendezvous-send. Many outstanding{{clarify|reason=What is an "outstanding" operation?|date=April 2015}}pending operations are possible in asynchronous mode, in most implementations.
 
MPI-1 and MPI-2 both enable implementations that overlap communication and computation, but practice and theory differ. MPI also specifies ''[[thread safe]]'' interfaces, which have [[cohesion (computer science)|cohesion]] and [[coupling (computer science)|coupling]] strategies that help avoid hidden state within the interface. It is relatively easy to write multithreaded point-to-point MPI code, and some implementations support such code. [[Multithreading (computer architecture)|Multithreaded]] collective communication is best accomplished with multiple copies of Communicators, as described below.
Line 79:
 
===Derived data types===
Many MPI functions require that you specifyspecifing the type of data which is sent between processes. This is because MPI aims to support heterogeneous environments where types might be represented differently on the different nodes<ref name="node37">{{cite web|url=http://mpi-forum.org/docs/mpi-1.1/mpi-11-html/node37.html|title=Type matching rules|website=mpi-forum.org}}</ref> (for example they might be running different CPU architectures that have different [[endianness]]), in which case MPI implementations can perform ''data conversion''.<ref name="node37" /> Since the C language does not allow a type itself to be passed as a parameter, MPI predefines the constants <code>MPI_INT</code>, <code>MPI_CHAR</code>, <code>MPI_DOUBLE</code> to correspond with <code>int</code>, <code>char</code>, <code>double</code>, etc.
 
Here is an example in C that passes arrays of <code>int</code>s from all processes to one. The one receiving process is called the "root" process, and it can be any designated process but normally it will be process 0. All the processes ask to send their arrays to the root with <code>MPI_Gather</code>, which is equivalent to having each process (including the root itself) call <code>MPI_Send</code> and the root make the corresponding number of ordered <code>MPI_Recv</code> calls to assemble all of these arrays into a larger one:<ref>{{cite web|url=https://www.open-mpi.org/doc/v1.8/man3/MPI_Gather.3.php|title=MPI_Gather(3) man page (version 1.8.8)|website=www.open-mpi.org}}</ref>
Line 93:
</syntaxhighlight>
 
However, youit may be instead wishdesirable to send data as one block as opposed to 100 <code>int</code>s. To do this define a "contiguous block" derived data type:
<syntaxhighlight lang="c">
MPI_Datatype newtype;
Line 161:
 
===I/O===
{{Expand section|date=June 2008}}
 
The parallel I/O feature is sometimes called MPI-IO,<ref name="Gropp99adv-pp5-6">{{harvnb |Gropp |Lusk |Skjelling |1999b |pp=5–6 }}</ref> and refers to a set of functions designed to abstract I/O management on distributed systems to MPI, and allow files to be easily accessed in a patterned way using the existing derived datatype functionality.
 
The little research that has been done on this feature indicates that it may not be trivial to get high performance gains by using MPI-IO. For example, an implementation of sparse [[Matrix multiplication|matrix-vector multiplications]] using the MPI I/O library shows a general behavior of minor performance gain, but these results are inconclusive.<ref>{{citeCite web|url=http://marcovan.hulten.org/report.pdf|title=Sparse matrix-vector multiplications using the MPI I/O library}}</ref> It was not
until the idea of collective I/O<ref>{{cite web|title=Data Sieving and Collective I/O in ROMIO|url=http://www.mcs.anl.gov/~thakur/papers/romio-coll.pdf|publisher=IEEE|date=Feb 1999}}</ref> implemented into MPI-IO that MPI-IO started to reach widespread adoption. Collective I/O substantially boosts applications' I/O bandwidth by having processes collectively transform the small and noncontiguous I/O operations into large and contiguous ones, thereby reducing the [[Record locking|locking]] and disk seek overhead. Due to its vast performance benefits, MPI-IO also became the underlying I/O layer for many state-of-the-art I/O libraries, such as [[HDF5]] and [[NetCDF|Parallel NetCDF]]. Its popularity also triggered research on collective I/O optimizations, such as layout-aware I/O<ref>{{cite book|chapter=LACIO: A New Collective I/O Strategy for Parallel I/O Systems|publisher=IEEE|date=Sep 2011|doi=10.1109/IPDPS.2011.79|isbn=978-1-61284-372-8|citeseerx=10.1.1.699.8972|title=2011 IEEE International Parallel & Distributed Processing Symposium|last1=Chen|first1=Yong|last2=Sun|first2=Xian-He|last3=Thakur|first3=Rajeev|last4=Roth|first4=Philip C.|last5=Gropp|first5=William D.|pages=794–804|s2cid=7110094}}</ref> and cross-file aggregation.<ref>{{cite journal|author1=Teng Wang|author2=Kevin Vasko|author3=Zhuo Liu|author4=Hui Chen|author5=Weikuan Yu|title=Enhance parallel input/output with cross-bundle aggregation|journal=The International Journal of High Performance Computing Applications|volume=30|issue=2|pages=241–256|date=2016|doi=10.1177/1094342015618017|s2cid=12067366}}</ref><ref>{{cite book|chapter=BPAR: A Bundle-Based Parallel Aggregation Framework for Decoupled I/O Execution|publisher=IEEE|date=Nov 2014|doi=10.1109/DISCS.2014.6|isbn=978-1-4673-6750-9|title=2014 International Workshop on Data Intensive Scalable Computing Systems|last1=Wang|first1=Teng|last2=Vasko|first2=Kevin|last3=Liu|first3=Zhuo|last4=Chen|first4=Hui|last5=Yu|first5=Weikuan|pages=25–32|s2cid=2402391}}</ref>
 
==Official implementations==
 
* The initial implementation of the MPI 1.x standard was [[MPICH]], from [[Argonne National Laboratory]] (ANL) and [[Mississippi State University]]. [[IBM]] also was an early implementor, and most early 90s [[supercomputer]] companies either commercialized MPICH, or built their own implementation. [[LAM/MPI]] from [[Ohio Supercomputer Center]] was another early open implementation. ANL has continued developing MPICH for over a decade, and now offers MPICH-4.3.20, implementing the MPI-34.1 standard.
* [[Open MPI]] (not to be confused with [[OpenMP]]) was formed by the merging FT-MPI, LA-MPI, [[LAM/MPI]], and PACX-MPI, and is found in many [[TOP-500]] [[supercomputer]]s.
 
Line 191 ⟶ 189:
 
===Common Language Infrastructure===
The two managed [[Common Language Infrastructure]] [[.NET Framework|.NET]] implementations are Pure Mpi.NET<ref>{{Cite web|url=http://www.purempi.net/|title=移住の際は空き家バンクと自治体の支援制度を利用しよう Pure- Mpi.NET]あいち移住ナビ|date=June 30, 2024}}</ref> and MPI.NET,<ref>{{cite web|url=http://www.osl.iu.edu/research/mpi.net/|title=MPI.NET: High-Performance C# Library for Message Passing|website=www.osl.iu.edu}}</ref> a research effort at [[Indiana University]] licensed under a [[BSD]]-style license. It is compatible with [[Mono (software)|Mono]], and can make full use of underlying low-latency MPI network fabrics.
The two managed [[Common Language Infrastructure]] [[.NET Framework|.NET]] implementations are Pure Mpi.NET<ref>
[http://www.purempi.net Pure Mpi.NET]</ref> and MPI.NET,<ref>{{cite web|url=http://www.osl.iu.edu/research/mpi.net/|title=MPI.NET: High-Performance C# Library for Message Passing|website=www.osl.iu.edu}}</ref> a research effort at [[Indiana University]] licensed under a [[BSD]]-style license. It is compatible with [[Mono (software)|Mono]], and can make full use of underlying low-latency MPI network fabrics.
 
===Java===
Line 217 ⟶ 214:
 
===Python===
Actively maintained MPI wrappers for [[Python (programming language)|Python]] include: mpi4py,<ref>{{citeCite web|url=https://mpi4py.readthedocs.io/en/stable/|title=MPI for Python — MPI for Python 4.1.0 documentation|website=mpi4py.readthedocs.io}}</ref> numba-mpi<ref>{{citeCite web|url=https://pypi.org/pproject/numba-mpi/|title=numba-mpiClient Challenge|website=pypi.org}}</ref> and numba-jax.<ref>{{citeCite web|url=https://mpi4jax.readthedocs.io/en/latest/|title=mpi4jax — mpi4jax documentation|website=mpi4jax.readthedocs.io}}</ref>
 
Discontinued developments include: [[pyMPI]], pypar,<ref>{{cite web|url=https://code.google.com/p/pypar/|title=Google Code Archive - Long-term storage for Google Code Project Hosting.|website=code.google.com}}</ref> MYMPI<ref>Now part of [httphttps://sourceforge.net/projects/pydusa/ Pydusa]</ref> and the MPI submodule in [[ScientificPython]].
 
===R===
Line 311 ⟶ 308:
# MPI-2 implementations include I/O and dynamic process management, and the size of the middleware is substantially larger. Most sites that use batch scheduling systems cannot support dynamic process management. MPI-2's parallel I/O is well accepted.{{Citation needed|date=January 2011}}
# Many MPI-1.2 programs were developed before MPI-2. Portability concerns initially slowed adoption, although wider support has lessened this.
# Many MPI-1.2 applications use only a subset of that standard (16-2516–25 functions) with no real need for MPI-2 functionality.
 
==Future==
Some aspects of the MPI's future appear solid; others less so. The MPI Forum reconvened in 2007 to clarify some MPI-2 issues and explore developments for a possible MPI-3, which resulted in versions MPI-3.0 (September 2012)<ref>https://www.mpi-forum.org/docs/mpi-3.0/mpi30-report.pdf {{Bare URL PDF|date=July 2025}}</ref> and MPI-3.1 (June 2015).<ref>https://www.mpi-forum.org/docs/mpi-3.1/mpi31-report.pdf {{Bare URL PDF|date=July 2025}}</ref>. The development continued with the approval of MPI-4.0 on June 9, 2021,<ref>https://www.mpi-forum.org/docs/mpi-4.0/mpi40-report.pdf {{Bare URL PDF|date=July 2025}}</ref> and most recently, MPI-4.1 was approved on November 2, 2023.<ref>https://www.mpi-forum.org/docs/mpi-4.1/mpi41-report.pdf {{Bare URL PDF|date=July 2025}}</ref>
 
Architectures are changing, with greater internal concurrency ([[Multi-core processor|multi-core]]), better fine-grained concurrency control (threading, affinity), and more levels of [[memory hierarchy]]. [[Multithreading (computer architecture)|Multithreaded]] programs can take advantage of these developments more easily than single-threaded applications. This has already yielded separate, complementary standards for [[symmetric multiprocessing]], namely [[OpenMP]]. MPI-2 defines how standard-conforming implementations should deal with multithreaded issues, but does not require that implementations be multithreaded, or even thread-safe. MPI-3 adds the ability to use shared-memory parallelism within a node. Implementations of MPI such as Adaptive MPI, Hybrid MPI, Fine-Grained MPI, MPC and others offer extensions to the MPI standard that address different challenges in MPI.
Line 372 ⟶ 369:
|title=A High-Performance, Portable Implementation of the MPI Message Passing Interface
|journal=Parallel Computing |volume=22 |issue=6 |pages=789–828 |doi=10.1016/0167-8191(96)00024-5 }}
* Pacheco, Peter S. (1997) ''[https://books.google.com/books?&id=tCVkM1z2aOoC Parallel Programming with MPI]''.[http://www.cs.usfca.edu/mpi/ Parallel Programming with MPI] 500 pp. Morgan Kaufmann {{ISBN|1-55860-339-5}}.
* ''MPI—The Complete Reference'' series:
** Snir, Marc; Otto, Steve W.; Huss-Lederman, Steven; Walker, David W.; Dongarra, Jack J. (1995) ''[http://www.netlib.org/utk/papers/mpi-book/mpi-book.html MPI: The Complete Reference]''. MIT Press Cambridge, MA, USA. {{ISBN|0-262-69215-5}}