Message Passing Interface: Difference between revisions

Content deleted Content added
mNo edit summary
m Replaced 4 bare URLs by {{Cite web}}; Replaced "Archived copy" by actual titles
 
(806 intermediate revisions by more than 100 users not shown)
Line 1:
{{short description|Message-passing system for parallel computers}}
{{expand}}
{{copyedit}}
The '''Message Passing Interface''' ('''MPI''') is a language-independent
[[computer]] communications descriptive
application programmer interface ([[API]]), with defined semantics, and with flexible interpretations;
it does not define the
[[protocol (computing)|protocol]] by which these operations are to be performed in
the sense of [[sockets]] for [[TCP/IP]] or other layer-4 and below models in the [[ISO/OSI Reference Model]]. It is consequently a layer-5+ type set of interfaces, although implementations can cover most layers of the reference model, with sockets+TCP/IP as a common transport used inside the implementation.
MPI's dual goals are high performance (scalability), and high portability. High productivity of
the interface, in programmer terms, is not one of the key goals of MPI, and MPI is generally considered to be low-level. It expresses parallelism explicitly, rather than implicitly. MPI is considered
successful in achieving high performance and high portability, but is often criticized for its
low-level qualities; there at present is no effective replacement, so it remains a crucial part of
parallel programming to this date. MPI is not sanctioned by any major standards body, but nonetheless has worldwide practical acceptance.
 
{{multiple issues|
MPI is a ''[[de facto]]'' [[standardization|standard]] for [[communication]] among the processes modeling a [[parallel programming|parallel program]] on a [[Distributed memory|distributed memory system]]. Often these programs are mapped to clusters, actual distributed memory supercomputers,
{{Update|reason=MPI-4.0 was approved by the MPI Forum in June 2021|date=October 2021}}
and to other environments. However, the principle MPI-1 model has no shared memory concept,
{{Update|reason=MPI-4.1 was approved by the MPI Forum in November 2023 (https://www.mpi-forum.org/docs/)|date=May 2024}}
and MPI-2 has only a limited distributed shared memory concept used in one portion of that
{{Update|reason=MPI-5.0 was approved by the MPI Forum on June 5, 2025. (https://www.mpi-forum.org/docs/)|date=September 2025}}
set of extensions.
}}
 
The '''Message Passing Interface''' ('''MPI''') is a portable [[message-passing]] standard designed to function on [[parallel computing]] [[computer architecture|architectures]].<ref>{{Cite web |title=Message Passing Interface :: High Performance Computing |url=https://hpc.nmsu.edu/discovery/mpi/introduction/ |access-date=2022-08-06 |website=hpc.nmsu.edu}}</ref> The MPI standard defines the [[syntax (programming languages)|syntax]] and [[semantics]] of [[library routine]]s that are useful to a wide range of users writing [[software portability|portable]] message-passing programs in [[C (programming language)|C]], [[C++]], and [[Fortran]]. There are several [[open-source]] MPI [[programming language implementation|implementations]], which fostered the development of a parallel [[software industry]], and encouraged development of portable and scalable large-scale parallel applications.
Most MPI implementations consist of a specific set of routines (API) callable from [[Fortran]], [[C programming language|C]], or [[C++]] and, by extension, from any language capable of interfacing with such routine libraries. The advantages of MPI over older message passing libraries are portability (because MPI has been implemented for almost every distributed memory architecture) and speed (because each [[implementation]] is in principle optimized for the [[hardware]] on which it runs). Interestingly, MPI is supported on shared-memory, and NUMA ([[Non-Uniform Memory Access]]) architectures as well, where it often serves both as an important portability architecture, but also helps achieve high performance
in applications that are naturally ''owner-computes'' oriented.
 
==History==
MPI is a specification, not an implementation. MPI has Language
The message passing interface effort began in the summer of 1991 when a small group of researchers started discussions at a mountain retreat in Austria. Out of that discussion came a Workshop on Standards for Message Passing in a Distributed Memory Environment, held on April 29–30, 1992 in [[Williamsburg, Virginia]].<ref>{{cite report |id= ORNL/TM-12147 |osti= 10170156 |author= Walker DW |date= August 1992 |title= Standards for message-passing in a distributed memory environment |url= https://technicalreports.ornl.gov/1992/3445603661204.pdf |institution= Oak Ridge National Lab., TN (United States), Center for Research on Parallel Computing (CRPC) |pages= 25 |access-date= 2019-08-18 |archive-date= 2023-11-15 |archive-url= https://web.archive.org/web/20231115183232/https://technicalreports.ornl.gov/1992/3445603661204.pdf |url-status= dead }}</ref> Attendees at Williamsburg discussed the basic features essential to a standard message-passing interface and established a working group to continue the standardization process. [[Jack Dongarra]], [[Tony Hey]], and David W. Walker put forward a preliminary draft proposal, "MPI1", in November 1992. In November 1992 a meeting of the MPI working group took place in Minneapolis and decided to place the standardization process on a more formal footing. The MPI working group met every 6 weeks throughout the first 9 months of 1993. The draft MPI standard was presented at the Supercomputing '93 conference in November 1993.<ref>{{cite conference |title= MPI: A Message Passing Interface |author= The MPI Forum, CORPORATE |date= November 15–19, 1993 |conference= Supercomputing '93 |conference-url= http://supercomputing.org/ |book-title= Proceedings of the 1993 ACM/IEEE conference on Supercomputing |publisher= ACM |___location= Portland, Oregon, USA |pages= 878–883 |isbn= 0-8186-4340-4 |doi= 10.1145/169627.169855 |doi-access= free }}</ref> After a period of public comments, which resulted in some changes in MPI, version 1.0 of MPI was released in June 1994. These meetings and the email discussion together constituted the MPI Forum, membership of which has been open to all members of the [[High-performance computing|high-performance-computing]] community.
Independent Specifications (LIS) for the function calls, and language bindings. The first MPI standard specified [[ANSI C]] and Fortran-77 language bindings together with the LIS. The draft of this standard was presented at
Supercomputing 1994 (November 1994), and finalized soon thereafter. About 128 functions comprise the MPI-1.2 standard
as it is now defined.
 
The MPI effort involved about 80 people from 40 organizations, mainly in the United States and Europe. Most of the major vendors of [[concurrent computer]]s were involved in the MPI effort, collaborating with researchers from universities, government laboratories, and [[Private industry|industry]].
There are two versions of the standard that are currently popular: version 1.2, which emphasizes message passing and has a static runtime environment (fixed size of world), and, MPI-2.1, which includes new features such as scalable file I/O, dynamic process management, collective communication with two groups of processes, and C++ language bindings. MPI-2's LIS specifies over 500 functions and provides language bindings for ANSI C, ANSI Fortran (Fortran90), and ANSI C++.
Interoperability of objects defined in MPI was also added to allow for easier mixed-language message passing programming.
A side effect of MPI-2 standardization (completed in 1996) was clarification of the MPI-1 standard, creating the MPI-1.2 level.
 
MPI provides parallel hardware vendors with a clearly defined base set of routines that can be efficiently implemented. As a result, hardware vendors can build upon this collection of standard [[Low-level programming language|low-level]] routines to create [[High-level programming language|higher-level]] routines for the distributed-memory communication environment supplied with their [[parallel machine]]s. MPI provides a simple-to-use portable interface for the basic user, yet one powerful enough to allow programmers to use the high-performance message passing operations available on advanced machines.
It is important to note that MPI-1.2 programs, now deemed "legacy MPI-1 programs", still work under the MPI-2 standard although some functions have been deprecated. This is important since many older programs use only the MPI-1 subset.
 
In an effort to create a universal standard for message passing, researchers did not base it off of a single system but it incorporated the most useful features of several systems, including those designed by IBM, [[Intel]], [[nCUBE]], [[Parallel Virtual Machine|PVM]], Express, P4 and PARMACS. The message-passing paradigm is attractive because of wide portability and can be used in communication for distributed-memory and shared-memory multiprocessors, networks of workstations, and a combination of these elements. The paradigm can apply in multiple settings, independent of network speed or memory architecture.
MPI is often compared with [[PVM]], which was a popular distributed environment and message passing system developed in 1989,
and which was one of the systems that motivated the need for standard parallel message passing systems. Most computer science students who study
parallel programming are taught both Pthreads and MPI programming as complementary programming models.
 
Support for MPI meetings came in part from [[DARPA]] and from the U.S. [[National Science Foundation]] (NSF) under grant ASC-9310330, NSF Science and Technology Center Cooperative agreement number CCR-8809615, and from the [[European Commission]] through Esprit Project P6643. The [[University of Tennessee]] also made financial contributions to the MPI Forum.
==Functionality overview==
 
== Overview ==
The MPI interface is meant to provide essential virtual topology, [[synchronization]] and [[communication]] functionality between a set of processes (that have been mapped to nodes/servers/ computer instances) in a language independent way, with language specific syntax (bindings), plus a few features that are language specific. MPI programs always work with processes, although commonly
{{Update|part=section|date=August 2022|reason=The new features of the MPI-3, MPI-4, and MPI-5 are not well described. According to the specification: "MPI-3 standard contains significant extensions to MPI
people talk about processors. When one tries to get maximum performance, one process per processor (or more recently core) is selected, as part of the mapping activity; this mapping activity happens at runtime, through the agent that starts the MPI program, normally called mpirun or mpiexec.
functionality, including nonblocking collectives, new one-sided communication operations, and Fortran 2008 bindings."}}
MPI is a [[communication protocol]] for programming<ref>{{cite book |first=Frank |last=Nielsen | title=Introduction to HPC with MPI for Data Science | year=2016 | publisher=Springer |isbn=978-3-319-21903-5 |pages=195–211
|chapter=2. Introduction to MPI: The MessagePassing Interface | url=https://franknielsen.github.io/HPC4DS/index.html
|chapter-url=https://www.researchgate.net/publication/314626214 }}</ref> [[parallel computers]]. Both point-to-point and collective communication are supported. MPI "is a message-passing application programmer interface, together with protocol and semantic specifications for how its features must behave in any implementation."<ref>{{harvnb |Gropp |Lusk |Skjellum |1996 |p=3 }}</ref> MPI's goals are high performance, scalability, and portability. MPI remains the dominant model used in [[high-performance computing]] as of 2006.<ref>{{cite book|pages = 105|first1=Sayantan|last1=Sur|first2=Matthew J.|last2=Koop|first3=Dhabaleswar K.|last3=Panda| title=Proceedings of the 2006 ACM/IEEE conference on Supercomputing - SC '06 | chapter=High-performance and scalable MPI over InfiniBand with reduced memory usage: An in-depth performance analysis |date=11 November 2006|publisher=ACM|doi=10.1145/1188455.1188565|isbn = 978-0769527000|s2cid = 818662}}</ref>
 
MPI is not sanctioned by any major standards body; nevertheless, it has become a [[de facto standard|''de facto'' standard]] for [[communication]] among processes that model a [[parallel programming|parallel program]] running on a [[distributed memory]] system. Actual distributed memory supercomputers such as computer clusters often run such programs.
Such functions include, but are not limited to, point-to-point rendezvous-type send/receive operations, choosing between a [[Cartesian]] or [[graph]]-like logical process topology, exchanging data between process pairs (send/receive operations), combining partial results of computations (gathering and reduction operations), synchronizing nodes (barrier operation) as well as obtaining network-related information such as the number of processes in the computing session, current processor identity that a process is mapped to, neighboring processes accessible in a logical topology, and so on. Point-to-point operations come in synchronous, asynchronous, buffered, and ''ready'' forms, to allow both relatively stronger and weaker semantics for the synchronization aspects of a rendezvous-send.
Many outstanding operations are possible in asynchronous mode, in most implementations.
 
The principal MPI-1 model has no [[shared memory]] concept, and MPI-2 has only a limited [[distributed shared memory]] concept. Nonetheless, MPI programs are regularly run on shared memory computers, and both [[MPICH]] and [[Open MPI]] can use shared memory for message transfer if it is available.<ref>[http://knem.gforge.inria.fr/ KNEM: High-Performance Intra-Node MPI Communication] "MPICH2 (since release 1.1.1) uses KNEM in the DMA LMT to improve large message performance within a single node. Open MPI also includes KNEM support in its SM BTL component since release 1.5. Additionally, NetPIPE includes a KNEM backend since version 3.7.2."</ref><ref>{{cite web|url=https://www.open-mpi.org/faq/?category=sm|title=FAQ: Tuning the run-time characteristics of MPI sm communications|website=www.open-mpi.org}}</ref> Designing programs around the MPI model (contrary to explicit [[Shared memory (interprocess communication)|shared memory]] models) has advantages when running on [[Non-Uniform Memory Access|NUMA]] architectures since MPI encourages [[locality of reference|memory locality]]. Explicit shared memory programming was introduced in MPI-3.<ref>https://software.intel.com/en-us/articles/an-introduction-to-mpi-3-shared-memory-programming?language=en "The MPI-3 standard introduces another approach to hybrid programming that uses the new MPI Shared Memory (SHM) model"</ref><ref>[http://insidehpc.com/2016/01/shared-memory-mpi-3-0/ Shared Memory and MPI 3.0] "Various benchmarks can be run to determine which method is best for a particular application, whether using MPI + OpenMP or the MPI SHM extensions. On a fairly simple test case, speedups over a base version that used point to point communication were up to 5X, depending on the message."</ref><ref>[http://www.caam.rice.edu/~mk51/presentations/SIAMPP2016_4.pdf Using MPI-3 Shared Memory As a Multicore Programming System] (PDF presentation slides)</ref>
MPI guarantees that there be progress of asynchronous messages independent of the subsequent calls to MPI made by user processes (threads). This rule is often neglected in practical implementations, but is an important underlying principle when one thinks of using asynchronous operations. The relative value of overlapping communication and computation, asynchronous vs. synchronous transfers, and low latency vs. low overhead communication remain important controversies in the MPI user and implementer communities, although recent advances in multi-core architecture are likely to reenliven such debate.
MPI-1 and MPI-2 both enable implementations that do good work in overlapping communication and computation, but practice and theory differ. MPI also specifies ''thread safe'' interfaces, which have [[cohesion (computer science)]] and [[coupling (computer science)]] strategies that help avoid the manipulation of unsafe hidden state within the interface. As such, it is relatively easy to write multithreaded point-to-point MPI codes, and
some implementation support such codes. Multithreaded collective communication is best accomplished by using multiple copies of Communicators, as described below.
 
Although MPI belongs in layers 5 and higher of the [[OSI Reference Model]], implementations may cover most layers, with [[Internet socket|sockets]] and [[Transmission Control Protocol]] (TCP) used in the transport layer.
==Concept Overview==
Concepts of MPI-1 and MPI-2 are given next.
 
Most MPI implementations consist of a specific set of routines directly callable from [[C (programming language)|C]], [[C++]], [[Fortran]] (i.e., an API) and any language able to interface with such libraries, including [[C Sharp (programming language)|C#]], [[Java (programming language)|Java]] or [[Python (programming language)|Python]]. The advantages of MPI over older message passing libraries are portability (because MPI has been implemented for almost every distributed memory architecture) and speed (because each implementation is in principle optimized for the hardware on which it runs).
===Communicator overview===
Although MPI has many functions, there are a few concepts that are very important, and these concepts
when taken a few at a time, help people learn MPI quickly, and decide what functionality to use in their application programs.
 
MPI uses [[Language Independent Specification]]s (LIS) for calls and language bindings. The first MPI standard specified [[ANSI C]] and Fortran-77 bindings together with the LIS. The draft was presented at Supercomputing 1994 (November 1994)<ref name="SC94">[http://hpc.sagepub.com/content/8/3-4.toc Table of Contents — September 1994, 8 (3-4)]. Hpc.sagepub.com. Retrieved on 2014-03-24.</ref> <!-- (Obs: TO FIX: a reference must be included, because at sc94 site it is not possible to find references to support this statement: http://sc94.ameslab.gov/, http://sc94.ameslab.gov/AP/contents.html, http://www.pubzone.org/pages/publications/showVenue.do?venueId=13270) --> and finalized soon thereafter. About 128 functions constitute the MPI-1.3 standard which was released as the final end of the MPI-1 series in 2008.<ref name="MPI_Docs">[http://www.mpi-forum.org/docs/ MPI Documents]. Mpi-forum.org. Retrieved on 2014-03-24.</ref>
Communicators are groups of processes in the MPI session, each of which have rank order, and their
own virtual communication fabric for point-to-point operations. They also have independent
communication addressibility or space for collective communication. MPI also has explicit groups,
but these are mainly good for organizing and reorganizing subsets of processes, before another
Communicator is made. MPI understands single group Intracommunicator operations, and bi-partite (two-group) Intercommunicator communication. In MPI-1, single group operations are most prevalent, with bi-partitute operations finding their biggest role in MPI-2 where their usability is expanded to include
collective communication and in dynamic process management.
 
{{Anchor|VERSIONS}}
Communicators can be partitioned using several commands in MPI, these commands include a graph-coloring-type algorithm called MPI_COMM_SPLIT, which is commonly used to derive topological and other logical subgroupings in an efficient way.
At present, the standard has several versions: version 1.3 (commonly abbreviated ''MPI-1''), which emphasizes message passing and has a static runtime environment, MPI-2.2 (MPI-2), which includes new features such as parallel I/O, dynamic process management and remote memory operations,<ref name="Gropp99adv-pp4-5">{{harvnb|Gropp|Lusk|Skjellum|1999b|pp=4–5}}</ref> and MPI-3.1 (MPI-3), which includes extensions to the collective operations with non-blocking versions and extensions to the one-sided operations.<ref name="MPI_3.1">[http://www.mpi-forum.org/docs/mpi-3.1/mpi31-report.pdf MPI: A Message-Passing Interface Standard<br />Version 3.1, Message Passing Interface Forum, June 4, 2015]. http://www.mpi-forum.org. Retrieved on 2015-06-16.</ref>
MPI-2's LIS specifies over 500 functions and provides language bindings for ISO [[C (programming language)|C]], ISO [[C++]], and [[Fortran 90]]. Object interoperability was also added to allow easier mixed-language message passing programming. A side-effect of standardizing MPI-2, completed in 1996, was clarifying the MPI-1 standard, creating the MPI-1.2.
 
''MPI-2'' is mostly a superset of MPI-1, although some functions have been deprecated. MPI-1.3 programs still work under MPI implementations compliant with the MPI-2 standard.
===Point-to-point Basics===
This section needs to be developed.
 
''MPI-3.0'' introduces significant updates to the MPI standard, including nonblocking versions of collective operations, enhancements to one-sided operations, and a Fortran 2008 binding. It removes deprecated C++ bindings and various obsolete routines and objects. Importantly, any valid MPI-2.2 program that avoids the removed elements is also valid in MPI-3.0.
===Collective Basics===
This section needs to be developed.
 
''MPI-3.1'' is a minor update focused on corrections and clarifications, particularly for Fortran bindings. It introduces new functions for manipulating MPI_Aint values, nonblocking collective I/O routines, and methods for retrieving index values by name for MPI_T performance variables. Additionally, a general index was added. All valid MPI-3.0 programs are also valid in MPI-3.1.
===One-sided Communication (MPI-2)===
This section needs to be developed.
 
''MPI-4.0'' is a major update that introduces large-count versions of many routines, persistent collective operations, partitioned communications, and a new MPI initialization method. It also adds application info assertions and improves error handling definitions, along with various smaller enhancements. Any valid MPI-3.1 program is compatible with MPI-4.0.
===Collective Extensions (MPI-2)===
This section needs to be developed.
 
''MPI-4.1'' is a minor update focused on corrections and clarifications to the MPI-4.0 standard. It deprecates several routines, the MPI_HOST attribute key, and the mpif.h Fortran include file. A new routine has been added to inquire about the hardware running the MPI program. Any valid MPI-4.0 program remains valid in MPI-4.1.
===Dynamic Process Management (MPI-2)===
This section needs to be developed.
 
''MPI-5.0'' is a major update that introduces an [[Application Binary Interface]]. This allows for increased interoperability of MPI libraries from different MPI vendors, as well as increased performance in containerized environments.<ref>[https://www.hlrs.de/news/detail/mpi-forum-meets-at-hlrs-on-path-to-mpi-5 MPI Forum Meets at HLRS on Path to MPI 5.0]. Retrieved 2025-09-30</ref>
===MPI I/O (MPI-2)===
This section needs to be developed.
 
MPI is often compared with [[Parallel Virtual Machine]] (PVM), which was a popular distributed environment and message passing system developed in 1989, and which was one of the systems that motivated the need for standard parallel message passing. Threaded shared memory programming models (such as [[Pthreads]] and [[OpenMP]]) and message passing programming (MPI/PVM) can be considered complementary and have been used together on occasion in, for example, servers with multiple large shared-memory nodes.
===Miscellaneous Improvements of MPI-2===
This section needs to be developed.
 
==Functionality==
===Guidelines for Writing Multithreaded MPI-1 and MPI-2 programs===
{{Unreferenced section|date=July 2021}}
This section needs to be developed.
The MPI interface is meant to provide essential virtual topology, [[synchronization]], and communication functionality between a set of processes (that have been mapped to nodes/servers/computer instances) in a language-independent way, with language-specific syntax (bindings), plus a few language-specific features. MPI programs always work with processes, but programmers commonly refer to the processes as processors. Typically, for maximum performance, each [[CPU]] (or [[multi-core (computing)|core]] in a multi-core machine) will be assigned just a single process. This assignment happens at runtime through the agent that starts the MPI program, normally called mpirun or mpiexec.
 
MPI library functions include, but are not limited to, point-to-point rendezvous-type send/receive operations, choosing between a [[Cartesian tree|Cartesian]] or [[Graph (data structure)|graph]]-like logical process topology, exchanging data between process pairs (send/receive operations), combining partial results of computations (gather and reduce operations), synchronizing nodes (barrier operation) as well as obtaining network-related information such as the number of processes in the computing session, current processor identity that a process is mapped to, neighboring processes accessible in a logical topology, and so on. Point-to-point operations come in [[synchronization (computer science)|synchronous]], [[asynchronous i/o|asynchronous]], buffered, and ''ready'' forms, to allow both relatively stronger and weaker [[semantics]] for the synchronization aspects of a rendezvous-send. Many pending operations are possible in asynchronous mode, in most implementations.
==Implementations==
 
MPI-1 and MPI-2 both enable implementations that overlap communication and computation, but practice and theory differ. MPI also specifies ''[[thread safe]]'' interfaces, which have [[cohesion (computer science)|cohesion]] and [[coupling (computer science)|coupling]] strategies that help avoid hidden state within the interface. It is relatively easy to write multithreaded point-to-point MPI code, and some implementations support such code. [[Multithreading (computer architecture)|Multithreaded]] collective communication is best accomplished with multiple copies of Communicators, as described below.
==='Classical' Cluster and Supercomputer Implementations===
 
==Concepts==
The implementation language for MPI is different in general from the language or languages it seeks to support at runtime.
MPI provides several features. The following concepts provide context for all of those abilities and help the programmer to decide what functionality to use in their application programs. Four of MPI's eight basic concepts are unique to MPI-2.
Most MPI implementations are done in a combination of C, C++ and assembly language, and target C, C++, and Fortran programmers. However, the implementation language and the end-user language are in principle always decoupled.
 
===Communicator===
The initial implementation of the MPI 1.x standard was MPICH, from Argonne National Laboratory (correctly pronounced MPI-C-H, not pronounced as a single syllable) and Mississippi
Communicator objects connect groups of processes in the MPI session. Each communicator gives each contained process an independent identifier and arranges its contained processes in an ordered [[topology (disambiguation)|topology]]. MPI also has explicit groups, but these are mainly good for organizing and reorganizing groups of processes before another communicator is made. MPI understands single group intracommunicator operations, and bilateral intercommunicator communication. In MPI-1, single group operations are most prevalent. [[Bilateral synchronization|Bilateral]] operations mostly appear in MPI-2 where they include collective communication and dynamic in-process management.
State University. IBM also was an early implementor of the MPI standard, and most supercomputer companies of the early 1990's either commercialized MPICH, or built their own implementation the MPI 1.x standard. LAM/MPI from Ohio Supercomputing Center was another early open implementation. Argonne National
Laboratory has continued developing MPICH for over a decade, and now offers MPICH 2, which is an implementation of
MPI-2.1 standard. LAM/MPI, and a number of other MPI efforts recently merged to form a new world-wide
project, called the OpenMPI implementation,
but this name does not imply any connection with a special form of the standard. There are many other efforts that
are derivatives of MPICH, LAM, and other works, too numerous to name here. Recently, Microsoft added an MPI effort
to their Cluster Computing Kit (2005), based on MPICH 2. MPI has become and remains a vital interface
for concurrent programming to this date.
 
Communicators can be partitioned using several MPI commands. These commands include <code>MPI_COMM_SPLIT</code>, where each process joins one of several colored sub-communicators by declaring itself to have that color.
Many Linux distributions include MPI (either or both MPICH and LAM, as particular examples), but it is best to get newest versions from MPI developer sites. Many vendors have specialized open source versions of MPICH, LAM, and/or OpenMPI, which
provide better performance and stability.
 
===Point-to-point basics===
Besides the mainstream of MPI programming for high performance, MPI has been used widely with Python, Perl, and Java.
A number of important MPI functions involve communication between two specific processes. A popular example is <code>MPI_Send</code>, which allows one specified process to send a message to a second specified process. Point-to-point operations, as these are called, are particularly useful in patterned or irregular communication, for example, a [[data parallelism|data-parallel]] architecture in which each processor routinely swaps regions of data with specific other processors between calculation steps, or a [[master/slave (technology)|master–slave]] architecture in which the master sends new task data to a slave whenever the prior task is completed.
These communities are growing. MATLAB-based MPI appear in many forms, but no consensus on a single way of using MPI
with MATLAB yet exists. The next sections detail some of these efforts.
 
MPI-1 specifies mechanisms for both [[blocking (computing)|blocking]] and non-blocking point-to-point communication mechanisms, as well as the so-called 'ready-send' mechanism whereby a send request can be made only when the matching receive request has already been made.
===[[Python (programming language)|Python]]===
There are at least five known attempts to implement MPI for Python: [http://www.cimec.org.ar/python/mpi4py.html mpi4py], [http://datamining.anu.edu.au/~ole/pypar/ PyPar], [http://pympi.sourceforge.net/ PyMPI], [http://peloton.sdsc.edu/~tkaiser/mympi/ MYMPI], and [http://starship.python.net/~hinsen/ScientificPython/ The MPI submodule of ScientificPython]. PyMPI is notable because it is ''a variant python interpreter'' making the multi-node application the interpreter itself, rather than the code the interpreter runs. PyMPI implements most of the MPI spec and automatically works with compiled code that needs to make MPI calls. PyPar, MYMPI, and ScientificPython's module all are designed to work like a typical module used with nothing but an import statement. They make it the coder's job to decide when and where the call to MPI_Init belongs.
 
===[[OCaml]]Collective basics===
[[Collective operation|Collective functions]] involve communication among all processes in a process group (which can mean the entire process pool or a program-defined subset). A typical function is the <code>MPI_Bcast</code> call (short for "[[broadcasting (computing)|broadcast]]"). This function takes data from one node and sends it to all processes in the process group. A reverse operation is the <code>MPI_Reduce</code> call, which takes data from all processes in a group, performs an operation (such as summing), and stores the results on one node. <code>MPI_Reduce</code> is often useful at the start or end of a large distributed calculation, where each processor operates on a part of the data and then combines it into a result.
The [http://cristal.inria.fr/~xleroy/software.html#ocamlmpi OCamlMPI Module] implements a large subset of MPI functions and is in active use in scientific computing. To get a sense of its maturity: [http://caml.inria.fr/pub/ml-archives/caml-list/2003/07/155910c4eeb09e684f02ea4ae342873b.en.html it was reported on caml-list] that an eleven thousand line OCaml program was "MPI-ified", using the module, with an additional 500 lines of code and slight restructuring and has run with excellent results on up to 170 nodes in a supercomputer.
 
Other operations perform more sophisticated tasks, such as <code>MPI_Alltoall</code> which rearranges ''n'' items of data such that the ''n''th node gets the ''n''th item of data from each.
===[[Java (programming language)|Java]]===
Although Java does not have an official MPI binding, there have been several attempts to bridge Java with MPI, with different degrees of success and compatibility. One of the first attempt was [[Bryan Carpenter]]'s [http://www.hpjava.org/mpiJava.html mpiJava], essentially a collection of [[JNI]] wrappers to a local [[C (programming language)|C]] MPI library, resulting in a hybrid implementation with limited portability, which also has to be recompiled versus the specific MPI library being used.
 
===Derived data types===
However, this original project also defined the [http://www.hpjava.org/theses/shko/thesis_paper/node33.html mpiJava API] (a [[de-facto]] MPI [[API]] for Java following the equivalent C++ bindings closely) which other subsequent Java MPI projects followed. An alternative although less used API is the [http://www.hpjava.org/papers/MPJ-CPE/cpempi/node6.html MPJ API], designed to be more object-oriented and closer to [[Sun Microsystems]]' coding conventions. Other than the API used, Java MPI libraries can be either dependant on a local MPI library, or implement the message passing functions in Java, while some like [http://grid.u-strasbg.fr/p2pmpi/ P2P-MPI Java] also provide [[Peer to peer]] functionality and allow mixed platform operation (e.g. mixed [[Linux]] and [[Windows]] clusters).
Many MPI functions require specifing the type of data which is sent between processes. This is because MPI aims to support heterogeneous environments where types might be represented differently on the different nodes<ref name="node37">{{cite web|url=http://mpi-forum.org/docs/mpi-1.1/mpi-11-html/node37.html|title=Type matching rules|website=mpi-forum.org}}</ref> (for example they might be running different CPU architectures that have different [[endianness]]), in which case MPI implementations can perform ''data conversion''.<ref name="node37" /> Since the C language does not allow a type itself to be passed as a parameter, MPI predefines the constants <code>MPI_INT</code>, <code>MPI_CHAR</code>, <code>MPI_DOUBLE</code> to correspond with <code>int</code>, <code>char</code>, <code>double</code>, etc.
 
Here is an example in C that passes arrays of <code>int</code>s from all processes to one. The one receiving process is called the "root" process, and it can be any designated process but normally it will be process 0. All the processes ask to send their arrays to the root with <code>MPI_Gather</code>, which is equivalent to having each process (including the root itself) call <code>MPI_Send</code> and the root make the corresponding number of ordered <code>MPI_Recv</code> calls to assemble all of these arrays into a larger one:<ref>{{cite web|url=https://www.open-mpi.org/doc/v1.8/man3/MPI_Gather.3.php|title=MPI_Gather(3) man page (version 1.8.8)|website=www.open-mpi.org}}</ref>
Some of the most challenging parts of any MPI implementation for Java arise from the language's own limitations and peculiarities, such as the lack of proper [[Data pointer|pointers]] and linear memory address space for its objects , which make transferring multi-dimensional arrays and complex objects inefficient. The workarounds usually used involve transferring one line at a time or and/or performing explicit de-[[serialization]] and [[casting]] both at the sending and receiving end, simulating C or FORTRAN-like arrays by the use of a one-dimensional array, and pointers to primitive types by the use of single-element arrays, thus resulting in programming styles quite extraneous from Java's conventions.
<syntaxhighlight lang="c">
int send_array[100];
int root = 0; /* or whatever */
int num_procs, *recv_array;
MPI_Comm_size(comm, &num_procs);
recv_array = malloc(num_procs * sizeof(send_array));
MPI_Gather(send_array, sizeof(send_array) / sizeof(*send_array), MPI_INT,
recv_array, sizeof(send_array) / sizeof(*send_array), MPI_INT,
root, comm);
</syntaxhighlight>
 
However, it may be instead desirable to send data as one block as opposed to 100 <code>int</code>s. To do this define a "contiguous block" derived data type:
===[[MATLAB]]===
<syntaxhighlight lang="c">
This section needs to be developed.
MPI_Datatype newtype;
MPI_Type_contiguous(100, MPI_INT, &newtype);
MPI_Type_commit(&newtype);
MPI_Gather(array, 1, newtype, receive_array, 1, newtype, root, comm);
</syntaxhighlight>
 
For passing a class or a data structure, <code>MPI_Type_create_struct</code> creates an MPI derived data type from <code>MPI_predefined</code> data types, as follows:
<syntaxhighlight lang="c">
int MPI_Type_create_struct(int count,
int *blocklen,
MPI_Aint *disp,
MPI_Datatype *type,
MPI_Datatype *newtype)
</syntaxhighlight>
where:
* <code>count</code> is a number of blocks, and specifies the length (in elements) of the arrays <code>blocklen</code>, <code>disp</code>, and <code>type</code>.
* <code>blocklen</code> contains numbers of elements in each block,
* <code>disp</code> contains byte displacements of each block,
* <code>type</code> contains types of element in each block.
* <code>newtype</code> (an output) contains the new derived type created by this function
 
The <code>disp</code> (displacements) array is needed for [[data structure alignment]], since the compiler may pad the variables in a class or data structure. The safest way to find the distance between different fields is by obtaining their addresses in memory. This is done with <code>MPI_Get_address</code>, which is normally the same as C's <code>&</code> operator but that might not be true when dealing with [[memory segmentation]].<ref>{{cite web|url=http://www.mpich.org/static/docs/v3.1/www3/MPI_Get_address.html|title=MPI_Get_address|website=www.mpich.org}}</ref>
 
Passing a data structure as one block is significantly faster than passing one item at a time, especially if the operation is to be repeated. This is because fixed-size blocks do not require [[serialization]] during transfer.<ref>[http://www.boost.org/doc/libs/1_55_0/doc/html/mpi/python.html#mpi.python_skeleton_content Boost.MPI Skeleton/Content Mechanism rationale] (performance comparison graphs were produced using [[NetPIPE]])</ref>
 
Given the following data structures:
<syntaxhighlight lang="c">
struct A {
int f;
short p;
};
 
struct B {
struct A a;
int pp, vp;
};
</syntaxhighlight>
 
Here's the C code for building an MPI-derived data type:
<syntaxhighlight lang="c">
static const int blocklen[] = {1, 1, 1, 1};
static const MPI_Aint disp[] = {
offsetof(struct B, a) + offsetof(struct A, f),
offsetof(struct B, a) + offsetof(struct A, p),
offsetof(struct B, pp),
offsetof(struct B, vp)
};
static MPI_Datatype type[] = {MPI_INT, MPI_SHORT, MPI_INT, MPI_INT};
MPI_Datatype newtype;
MPI_Type_create_struct(sizeof(type) / sizeof(*type), blocklen, disp, type, &newtype);
MPI_Type_commit(&newtype);
</syntaxhighlight>
 
==MPI-2 concepts==
 
===One-sided communication===
MPI-2 defines three one-sided communications operations, <code>MPI_Put</code>, <code>MPI_Get</code>, and <code>MPI_Accumulate</code>, being a write to remote memory, a read from remote memory, and a reduction operation on the same memory across a number of tasks, respectively. Also defined are three different methods to synchronize this communication (global, pairwise, and remote locks) as the specification does not guarantee that these operations have taken place until a synchronization point.
 
These types of call can often be useful for algorithms in which synchronization would be inconvenient (e.g. distributed [[matrix multiplication]]), or where it is desirable for tasks to be able to balance their load while other processors are operating on data.
 
===Dynamic process management===
{{Expand section|date=June 2008}}
 
The key aspect is "the ability of an MPI process to participate in the creation of new MPI processes or to establish communication with MPI processes that have been started separately." The MPI-2 specification describes three main interfaces by which MPI processes can dynamically establish communications, <code>MPI_Comm_spawn</code>, <code>MPI_Comm_accept</code>/<code>MPI_Comm_connect</code> and <code>MPI_Comm_join</code>. The <code>MPI_Comm_spawn</code> interface allows an MPI process to spawn a number of instances of the named MPI process. The newly spawned set of MPI processes form a new <code>MPI_COMM_WORLD</code> intracommunicator but can communicate with the parent and the intercommunicator the function returns. <code>MPI_Comm_spawn_multiple</code> is an alternate interface that allows the different instances spawned to be different binaries with different arguments.<ref name="Gropp99adv-p7">{{harvnb |Gropp |Lusk |Skjelling |1999b |p=7 }}</ref>
 
===I/O===
The parallel I/O feature is sometimes called MPI-IO,<ref name="Gropp99adv-pp5-6">{{harvnb |Gropp |Lusk |Skjelling |1999b |pp=5–6 }}</ref> and refers to a set of functions designed to abstract I/O management on distributed systems to MPI, and allow files to be easily accessed in a patterned way using the existing derived datatype functionality.
 
The little research that has been done on this feature indicates that it may not be trivial to get high performance gains by using MPI-IO. For example, an implementation of sparse [[Matrix multiplication|matrix-vector multiplications]] using the MPI I/O library shows a general behavior of minor performance gain, but these results are inconclusive.<ref>{{Cite web|url=http://marcovan.hulten.org/report.pdf|title=Sparse matrix-vector multiplications using the MPI I/O library}}</ref> It was not
until the idea of collective I/O<ref>{{cite web|title=Data Sieving and Collective I/O in ROMIO|url=http://www.mcs.anl.gov/~thakur/papers/romio-coll.pdf|publisher=IEEE|date=Feb 1999}}</ref> implemented into MPI-IO that MPI-IO started to reach widespread adoption. Collective I/O substantially boosts applications' I/O bandwidth by having processes collectively transform the small and noncontiguous I/O operations into large and contiguous ones, thereby reducing the [[Record locking|locking]] and disk seek overhead. Due to its vast performance benefits, MPI-IO also became the underlying I/O layer for many state-of-the-art I/O libraries, such as [[HDF5]] and [[NetCDF|Parallel NetCDF]]. Its popularity also triggered research on collective I/O optimizations, such as layout-aware I/O<ref>{{cite book|chapter=LACIO: A New Collective I/O Strategy for Parallel I/O Systems|publisher=IEEE|date=Sep 2011|doi=10.1109/IPDPS.2011.79|isbn=978-1-61284-372-8|citeseerx=10.1.1.699.8972|title=2011 IEEE International Parallel & Distributed Processing Symposium|last1=Chen|first1=Yong|last2=Sun|first2=Xian-He|last3=Thakur|first3=Rajeev|last4=Roth|first4=Philip C.|last5=Gropp|first5=William D.|pages=794–804|s2cid=7110094}}</ref> and cross-file aggregation.<ref>{{cite journal|author1=Teng Wang|author2=Kevin Vasko|author3=Zhuo Liu|author4=Hui Chen|author5=Weikuan Yu|title=Enhance parallel input/output with cross-bundle aggregation|journal=The International Journal of High Performance Computing Applications|volume=30|issue=2|pages=241–256|date=2016|doi=10.1177/1094342015618017|s2cid=12067366}}</ref><ref>{{cite book|chapter=BPAR: A Bundle-Based Parallel Aggregation Framework for Decoupled I/O Execution|publisher=IEEE|date=Nov 2014|doi=10.1109/DISCS.2014.6|isbn=978-1-4673-6750-9|title=2014 International Workshop on Data Intensive Scalable Computing Systems|last1=Wang|first1=Teng|last2=Vasko|first2=Kevin|last3=Liu|first3=Zhuo|last4=Chen|first4=Hui|last5=Yu|first5=Weikuan|pages=25–32|s2cid=2402391}}</ref>
 
==Official implementations==
 
* The initial implementation of the MPI 1.x standard was [[MPICH]], from [[Argonne National Laboratory]] (ANL) and [[Mississippi State University]]. [[IBM]] also was an early implementor, and most early 90s [[supercomputer]] companies either commercialized MPICH, or built their own implementation. [[LAM/MPI]] from [[Ohio Supercomputer Center]] was another early open implementation. ANL has continued developing MPICH for over a decade, and now offers MPICH-4.3.0, implementing the MPI-4.1 standard.
* [[Open MPI]] (not to be confused with [[OpenMP]]) was formed by the merging FT-MPI, LA-MPI, [[LAM/MPI]], and PACX-MPI, and is found in many [[TOP-500]] [[supercomputer]]s.
 
Many other efforts are derivatives of MPICH, LAM, and other works, including, but not limited to, commercial implementations from [[Hewlett Packard Enterprise|HPE]], [[Intel]], [[Microsoft]], and [[NEC]].
 
While the specifications mandate a C and Fortran interface, the language used to implement MPI is not constrained to match the language or languages it seeks to support at runtime. Most implementations combine C, C++ and assembly language, and target C, C++, and Fortran programmers. Bindings are available for many other languages, including Perl, Python, R, Ruby, Java, and [[Control Language|CL]] (see [[#Language bindings]]).
 
The [[application binary interface|ABI]] of MPI implementations are roughly split between [[MPICH]] and [[Open MPI]] derivatives, so that a library from one family works as a drop-in replacement of one from the same family, but direct replacement across families is impossible. The French [[French Alternative Energies and Atomic Energy Commission|CEA]] maintains a wrapper interface to facilitate such switches.<ref>{{cite web |author1=cea-hpc |title=cea-hpc/wi4mpi: Wrapper interface for MPI |url=https://github.com/cea-hpc/wi4mpi |website=GitHub |language=en}}</ref>
 
===Hardware===
MPI hardware research focuses on implementing MPI directly in hardware, for example via [[processor-in-memory]], building MPI operations into the microcircuitry of the [[Random-access memory|RAM]] chips in each node. By implication, this approach is independent of language, operating system, and CPU, but cannot be readily updated or removed.
 
Another approach has been to add hardware acceleration to one or more parts of the operation, including hardware processing of MPI queues and using [[Remote direct memory access|RDMA]] to directly transfer data between memory and the [[network interface controller]] without CPU or OS kernel intervention.
 
===Compiler wrappers===
'''mpicc''' (and similarly '''mpic++''', '''mpif90''', etc.) is a program that wraps over an existing compiler to set the necessary command-line flags when compiling code that uses MPI. Typically, it adds a few flags that enable the code to be the compiled and linked against the MPI library.<ref>[http://www.mpich.org/static/docs/latest/www1/mpicc.html mpicc]. Mpich.org. Retrieved on 2014-03-24.</ref>
 
==Language bindings==
[[Language binding|Bindings]] are libraries that extend MPI support to other languages by wrapping an existing MPI implementation such as MPICH or Open MPI.
 
===Common Language Infrastructure===
The two managed [[Common Language Infrastructure]] [[.NET Framework|.NET]] implementations are Pure Mpi.NET<ref>{{Cite web|url=http://www.purempi.net/|title=移住の際は空き家バンクと自治体の支援制度を利用しよう - あいち移住ナビ|date=June 30, 2024}}</ref> and MPI.NET,<ref>{{cite web|url=http://www.osl.iu.edu/research/mpi.net/|title=MPI.NET: High-Performance C# Library for Message Passing|website=www.osl.iu.edu}}</ref> a research effort at [[Indiana University]] licensed under a [[BSD]]-style license. It is compatible with [[Mono (software)|Mono]], and can make full use of underlying low-latency MPI network fabrics.
 
===Java===
Although [[Java (programming language)|Java]] does not have an official MPI binding, several groups attempt to bridge the two, with different degrees of success and compatibility. One of the first attempts was Bryan Carpenter's mpiJava,<ref>{{cite web|url=http://www.hpjava.org/mpiJava.html|title=mpiJava Home Page|website=www.hpjava.org}}</ref> essentially a set of [[Java Native Interface]] (JNI) wrappers to a local C MPI library, resulting in a hybrid implementation with limited portability, which also has to be compiled against the specific MPI library being used.
 
However, this original project also defined the mpiJava API<ref>{{cite web|url=http://www.hpjava.org/theses/shko/thesis_paper/node33.html|title=Introduction to the mpiJava API|website=www.hpjava.org}}</ref> (a [[de facto]] MPI [[API]] for Java that closely followed the equivalent C++ bindings) which other subsequent Java MPI projects adopted. One less-used API is MPJ API, which was designed to be more [[Object-oriented programming|object-oriented]] and closer to [[Sun Microsystems]]' coding conventions.<ref>{{cite web|url=http://www.hpjava.org/papers/MPJ-CPE/cpempi/node6.html|title=The MPJ API Specification|website=www.hpjava.org}}</ref> Beyond the API, Java MPI libraries can be either dependent on a local MPI library, or implement the message passing functions in Java, while some like [[P2P-MPI]] also provide [[peer-to-peer]] functionality and allow mixed-platform operation.
 
Some of the most challenging parts of Java/MPI arise from Java characteristics such as the lack of explicit [[data pointer|pointers]] and the [[Flat memory model|linear memory]] address space for its objects, which make transferring multidimensional arrays and complex objects inefficient. Workarounds usually involve transferring one line at a time and/or performing explicit de-[[serialization]] and [[Cast (computer science)|casting]] at both the sending and receiving ends, simulating C or Fortran-like arrays by the use of a one-dimensional array, and pointers to primitive types by the use of single-element arrays, thus resulting in programming styles quite far from Java conventions.
 
Another Java message passing system is MPJ Express.<ref>{{cite web|url=http://mpj-express.org/|title=MPJ Express Project|website=mpj-express.org}}</ref> Recent versions can be executed in cluster and multicore configurations. In the cluster configuration, it can execute parallel Java applications on clusters and clouds. Here Java sockets or specialized I/O interconnects like [[Myrinet]] can support messaging between MPJ Express processes. It can also utilize native C implementation of MPI using its native device. In the multicore configuration, a parallel Java application is executed on multicore processors. In this mode, MPJ Express processes are represented by Java threads.
 
===Julia===
There is a [[Julia (programming language)|Julia]] language wrapper for MPI.<ref>{{Citation|title=JuliaParallel/MPI.jl|date=2019-10-03|url=https://github.com/JuliaParallel/MPI.jl|publisher=Parallel Julia|access-date=2019-10-08}}</ref>
 
===MATLAB===
There are a few academic implementations of MPI using [[MATLAB]]. MATLAB has its own parallel extension library implemented using MPI and [[Parallel Virtual Machine|PVM]].
 
===OCaml===
The OCamlMPI Module<ref>{{cite web|url=http://cristal.inria.fr/~xleroy/software.html#ocamlmpi|title=Xavier Leroy - Software|website=cristal.inria.fr}}</ref> implements a large subset of MPI functions and is in active use in scientific computing. An 11,000-line [[OCaml]] program was "MPI-ified" using the module, with an additional 500 lines of code and slight restructuring and ran with excellent results on up to 170 nodes in a supercomputer.<ref>[http://caml.inria.fr/pub/ml-archives/caml-list/2003/07/155910c4eeb09e684f02ea4ae342873b.en.html Archives of the Caml mailing list > Message from Yaron M. Minsky]. Caml.inria.fr (2003-07-15). Retrieved on 2014-03-24.</ref>
 
===PARI/GP===
 
[[PARI/GP]] can be built<ref>{{cite web|url=https://pari.math.u-bordeaux.fr/pub/pari/manuals/2.13.3/parallel.pdf|title=Introduction to parallel GP|website=pari.math.u-bordeaux.fr}}</ref> to use MPI as its multi-thread engine, allowing to run parallel PARI and GP programs on MPI clusters unmodified.
 
===Python===
Actively maintained MPI wrappers for [[Python (programming language)|Python]] include: mpi4py,<ref>{{Cite web|url=https://mpi4py.readthedocs.io/en/stable/|title=MPI for Python — MPI for Python 4.1.0 documentation|website=mpi4py.readthedocs.io}}</ref> numba-mpi<ref>{{Cite web|url=https://pypi.org/project/numba-mpi/|title=Client Challenge|website=pypi.org}}</ref> and numba-jax.<ref>{{Cite web|url=https://mpi4jax.readthedocs.io/en/latest/|title=mpi4jax — mpi4jax documentation|website=mpi4jax.readthedocs.io}}</ref>
 
Discontinued developments include: pyMPI, pypar,<ref>{{cite web|url=https://code.google.com/p/pypar/|title=Google Code Archive - Long-term storage for Google Code Project Hosting.|website=code.google.com}}</ref> MYMPI<ref>Now part of [https://sourceforge.net/projects/pydusa/ Pydusa]</ref> and the MPI submodule in [[ScientificPython]].
 
===R===
[[R (programming language)|R]] bindings of MPI include [[Rmpi]]<ref>{{cite journal |last=Yu |first=Hao |title=Rmpi: Parallel Statistical Computing in R |year=2002 |url=https://cran.r-project.org/package=Rmpi |journal=R News }}</ref> and [[Programming with Big Data in R|pbdMPI]],<ref>{{cite web |last1=Chen |first1=Wei-Chen |last2=Ostrouchov |first2=George |last3=Schmidt |first3=Drew |last4=Patel |first4=Pragneshkumar |last5=Yu |first5=Hao |title=pbdMPI: Programming with Big Data -- Interface to MPI |year=2012 |url=https://cran.r-project.org/package=pbdMPI }}</ref> where Rmpi focuses on [[Master/slave (technology)|manager-workers]] parallelism while pbdMPI focuses on [[SPMD]] parallelism. Both implementations fully support [[Open MPI]] or [[MPICH2]].
 
==Example program==
Here is a [["Hello, World!" program]] in MPI written in C. In this example, we send a "hello" message to each processor, manipulate it trivially, sendreturn the results back to the main process, and print the messages out.
 
<syntaxhighlight lang="c">
/*
/*
"Hello World" Type MPI Test Program
"Hello World" MPI Test Program
*/
*/
#include <mpi.h>
#include <stdioassert.h>
#include <stringstdio.h>
#include <string.h>
#include <mpi.h>
#define BUFSIZE 128
#define TAG 0
int main(int argc, char *argv[])
{
char idstr[32];
char buff[BUFSIZE];
int numprocs;
int myid;
int i;
MPI_Status stat;
MPI_Init(&argc,&argv); /* all MPI programs start with MPI_Init; all 'N' processes exist thereafter */
MPI_Comm_size(MPI_COMM_WORLD,&numprocs); /* find out how big the SPMD world is */
MPI_Comm_rank(MPI_COMM_WORLD,&myid); /* and this processes' rank is */
/* At this point, all the programs are running equivalently, the rank is used to
distinguish the roles of the programs in the SPMD model, with rank 0 often used
specially... */
if(myid == 0)
{
printf("%d: We have %d processors\n", myid, numprocs);
for(i=1;i<numprocs;i++)
{
sprintf(buff, "Hello %d! ", i);
MPI_Send(buff, BUFSIZE, MPI_CHAR, i, TAG, MPI_COMM_WORLD);
}
for(i=1;i<numprocs;i++)
{
MPI_Recv(buff, BUFSIZE, MPI_CHAR, i, TAG, MPI_COMM_WORLD, &stat);
printf("%d: %s\n", myid, buff);
}
}
else
{
/* receive from rank 0: */
MPI_Recv(buff, BUFSIZE, MPI_CHAR, 0, TAG, MPI_COMM_WORLD, &stat);
sprintf(idstr, "Processor %d ", myid);
strcat(buff, idstr);
strcat(buff, "reporting for duty\n");
/* send to rank 0: */
MPI_Send(buff, BUFSIZE, MPI_CHAR, 0, TAG, MPI_COMM_WORLD);
}
MPI_Finalize(); /* MPI Programs end with MPI Finalize; this is a weak synchronization point */
return 0;
}
 
int main(int argc, char **argv)
It is important to note that the runtime environment for the MPI implementation used
{
(often called MPIRUN or MPIEXEC, spawns multiple copies of this program text, with the total number of copies
char buf[256];
determining the number of process ''ranks'' in MPI_COMM_WORLD, which is an opaque descriptor for communication
int my_rank, num_procs;
between the set of processes. A Single-Program-Multiple-Data ([[SPMD]]) programming model is thereby facilitated.
Each process has its own rank, the total number of processes in the world, and the ability to communicate between
them either with point-to-point (send/receive) communication, or by collective communication among the group.
It is enough for MPI to provide an SPMD-style program with MPI_COMM_WORLD, its own rank, and the size of the world
to allow for algorithms to decide what they do based on their rank. In more robust examples, additional I/O to
the real-world is needed of course. MPI does not guarantee how POSIX I/O, as used in the example, would actually
work on a given system, but it commonly does work, at least from rank 0. If it does work, POSIX I/O like printf()
is not particularly scalable, and should be used sparingly.
 
/* Initialize the infrastructure necessary for communication */
The notion of process and not processor is used in MPI, as shown below. The copies of this program are ''mapped''
MPI_Init(&argc, &argv);
to processors by the runtime environment of MPI. In that sense, the parallel machine can map to 1 physical processor,
or N, where N is the total number of processors available, or something in between. For maximal potential for parallel speedup, more physical processors are used, but
the ability to separate the mapping from the design of the program is an essential value for development, as well
as for practical situations where resources are limited. It should also be noted that this example adjusts its
behavior to the size of the world N, so it also seeks to be scalable to the size given at runtime. There is no
separate compilation for each size of the concurrency, although different decisions might be taken internally
depending on that absolute amount of concurrency provided to the program.
 
/* Identify this process */
==Adoption of MPI-2==
MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
While the adoption of MPI-1.2 has been universal, including on almost all cluster computing, the acceptance of MPI-2.1
has been more limited. Here are some of the reasons.
 
/* Find out how many total processes are active */
1. While MPI-1.2 emphasizes message passing and a minimal, static runtime environment, full MPI-2 implementations include I/O and dynamic process management, and the size of the middleware implementation is substantially larger. Furthermore,
MPI_Comm_size(MPI_COMM_WORLD, &num_procs);
most sites that use batch scheduling systems cannot support dynamic process management. Parallel I/O is well accepted
as a key value of MPI-2.
 
/* Until this point, all programs have been doing exactly the same.
2. Many legacy MPI-1.2 programs were already developed by the time MPI-2 came out, and work fine. The threat of potentially lost portability by using MPI-2 functions kept people from using the enhanced standard for many years, though this is lessening
Here, we check the rank to distinguish the roles of the programs */
in the mid 2000's, with wider support for MPI-2.
if (my_rank == 0) {
int other_rank;
printf("We have %i processes.\n", num_procs);
 
/* Send messages to all other processes */
3. Many MPI-1.2 applications use only a subset of that standard (16-25 functions). This minimalism
for (other_rank = 1; other_rank < num_procs; other_rank++)
of use contrasts with the huge availability of functionality now afforded in MPI-2.
{
sprintf(buf, "Hello %i!", other_rank);
MPI_Send(buf, 256, MPI_CHAR, other_rank,
0, MPI_COMM_WORLD);
}
 
/* Receive messages from all other processes */
Other inhibiting factors can be cited too, although these may amount more to perceptions and belief than fact.
for (other_rank = 1; other_rank < num_procs; other_rank++)
MPI-2 has been well supported in free and commercial implementations since at least the early 2000's, with some implementations
{
coming earlier than that.
MPI_Recv(buf, 256, MPI_CHAR, other_rank,
0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
printf("%s\n", buf);
}
 
} else {
==The Future of MPI==
There are several schools of thought on this. The MPI Forum has been dormant for a decade, but maintains its mailing list.
Recently (late 2006), the mailing list was revived, for the purpose of clarifying MPI-2 issues, and possibly for
defining a new standard level.
 
/* Receive message from process #0 */
1. MPI as a legacy interface is guaranteed to exist at the MPI-1.2 and MPI-2.1 levels for many years to come. Like
MPI_Recv(buf, 256, MPI_CHAR, 0,
Fortran, it is ubiquitous in technical computing, taught everywhere, and used everywhere. The body of free and commercial products that require MPI help ensure that will go on indefinitely, as will new ports of the existing free and commercial implementations to new target platforms.
0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
assert(memcmp(buf, "Hello ", 6) == 0);
 
/* Send message to process #0 */
2. Architectures are changing, with greater internal concurrency (multi-core), better fine-grain concurrency control (threading, affinity), and more levels of memory hierarchy. This has already yielded separate, complementary standards
sprintf(buf, "Process %i reporting for duty.", my_rank);
for SMP programming, namely OpenMP. However, in future, both massive scale and multi-granular concurrency reveal limitations
MPI_Send(buf, 256, MPI_CHAR, 0,
of the MPI standard, which is only tangentially friendly to multithreaded programming, and does not specify enough about
0, MPI_COMM_WORLD);
how multi-threaded programs should be written. While multi-threaded capable MPI implementations do exist, the number
of multithreaded, message passing applications are few. The drive to achieve multi-level concurrency all within MPI
is both a challenge and an opportunity for the standard in future.
 
}
3. The number of functions is huge, though as noted above, the number of concepts is relatively small. However, given
that many users don't use the majority of the capabilities of MPI-2, a future standard might be smaller as well as more
focused, or have profiles to allow different users to get what they need without waiting for a complete implementation
suite, or have all that code be validated from a software engineering point of view.
 
/* Tear down the communication infrastructure */
4. Grid computing, and virtual grid computing offer MPI's way of handling static and dynamic process management with
MPI_Finalize();
particular 'fits'. While it is possible for force the MPI model into working on a grid, the idea of a fault-free,
return 0;
long-running virtual machine under the MPI program is a forced on in a grid environment. Grids may want to instantiate
}
MPI APIs between sets of running processes, but multi-level middleware that addresses concurrency, faults, and message
</syntaxhighlight>
traffic are needed. Fault tolerant MPI's and Grid MPIs have been attempted, but the original design of MPI itself
impacts what can be done.
 
When run with 4 processes, it should produce the following output:<ref>The output snippet was produced on an ordinary Linux desktop system with Open MPI installed. [[Linux distribution|Distro]]s usually place the mpicc command into an openmpi-devel or libopenmpi-dev package, and sometimes make it necessary to run "module add mpi/openmpi-x86_64" or similar before mpicc and mpiexec are available.</ref>
5. People want a higher productivity interface. MPI programs are often referred to as ''assembly language'' of
<pre>
parallel programming. This goal -- whether through semi-automated compilation -- or through model-driven architecture
$ mpicc example.c && mpiexec -n 4 ./a.out
and component engineering, or both, mean that MPI would have to evolve, and in some sense, move into the background.
We have 4 processes.
These areas, some well-funded by DARPA and others, others underway in academic groups worldwide, have yet to produce
Process 1 reporting for duty.
a consensus that can fundamentally disrupt MPI's key values - performance and portability and ubiquitous support.
Process 2 reporting for duty.
Process 3 reporting for duty.
</pre>
Here, <code>mpiexec</code> is a command used to execute the example program with 4 [[process (computing)|processes]], each of which is an independent instance of the program at run time and assigned ranks (i.e. numeric IDs) 0, 1, 2, and 3. The name <code>mpiexec</code> is recommended by the MPI standard, although some implementations provide a similar command under the name <code>mpirun</code>. The <code>MPI_COMM_WORLD</code> is the communicator that consists of all the processes.
 
A single program, multiple data ([[SPMD]]) programming model is thereby facilitated, but not required; many MPI implementations allow multiple, different, executables to be started in the same MPI job. Each process has its own rank, the total number of processes in the world, and the ability to communicate between them either with point-to-point (send/receive) communication, or by collective communication among the group. It is enough for MPI to provide an SPMD-style program with <code>MPI_COMM_WORLD</code>, its own rank, and the size of the world to allow algorithms to decide what to do. In more realistic situations, I/O is more carefully managed than in this example. MPI does not stipulate how standard I/O (stdin, stdout, stderr) should work on a given system. It generally works as expected on the rank-0 process, and some implementations also capture and funnel the output from other processes.
==See also==
*[[MPICH]]
*[[LAM/MPI]]
*[[Open MPI]]
*[[OpenMP]]
*[[Unified Parallel C]]
*[[Occam programming language]]
*[[Linda (coordination language)]]
*[[Parallel Virtual Machine]]
*[[Calculus of Communicating Systems]]
*[[Calculus of Broadcasting Systems]]
*[[Actor model]]
 
MPI uses the notion of process rather than processor. Program copies are ''mapped'' to processors by the MPI [[Runtime system|runtime]]. In that sense, the parallel machine can map to one physical processor, or to ''N'' processors, where ''N'' is the number of available processors, or even something in between. For maximum parallel speedup, more physical processors are used. This example adjusts its behavior to the size of the world ''N'', so it also seeks to scale to the runtime configuration without compilation for each size variation, although runtime decisions might vary depending on that absolute amount of concurrency available.
==External links==
* [http://www.mpi-forum.org/docs/ MPI specification]
* [http://dmoz.org/Computers/Parallel_Computing/Programming/Libraries/MPI/ MPI DMOZ category]
* [http://www.open-mpi.org/ Open MPI web site]
* [http://www.lam-mpi.org/ LAM/MPI web site]
* [http://www-unix.mcs.anl.gov/mpi/mpich/ MPICH]
* [http://www.pccluster.org/ SCore MPI]
* [http://www.scali.com/ Scali MPI]
* [http://www.hp.com/go/mpi HP-MPI]
* [http://nowlab.cse.ohio-state.edu/projects/mpi-iba/ MVAPICH: MPI over InfiniBand]
* [http://www.parawiki.org/index.php/MPI Parawiki page for MPI]
* [http://www.emsl.pnl.gov/docs/global/ga.html Global Arrays]
* [http://www.pvmmpi06.org/ PVM/MPI Users' Group Meeting (2006 edition)]
* [http://www-unix.mcs.anl.gov/mpi/tutorial/mpiexmpl/src/hellow/C/main.html MPI Samples]
* [http://www.myri.com/scs/download-mpichgm.html MPICH over Myrinet (GM, classic driver)]
* [http://www.myri.com/scs/download-mpichmx.html MPICH over Myrinet (MX, next-gen driver)]
* [http://www.ll.mit.edu/MatlabMPI/ Parallel Programming with MatlabMPI]
* [http://www.personal.leeds.ac.uk/~bgy1mm/MPITutorial/MPIHome.html MPI Tutorial ]
* [http://www.cs.usfca.edu/mpi/ Parallel Programming with MPI ]
* [http://exodus.physics.ucla.edu/appleseed/dev/developer.html MacMPI]
* [http://www.cs.ubc.ca/labs/dsg/mpi-sctp/ MPI over SCTP]
 
==MPI-2 adoption==
* [http://ipython.scipy.org/moin/Parallel_Computing IPython] allows MPI applications to be steered interactively.
Adoption of MPI-1.2 has been universal, particularly in cluster computing, but acceptance of MPI-2.1 has been more limited. Issues include:
 
# MPI-2 implementations include I/O and dynamic process management, and the size of the middleware is substantially larger. Most sites that use batch scheduling systems cannot support dynamic process management. MPI-2's parallel I/O is well accepted.{{Citation needed|date=January 2011}}
==References==
# Many MPI-1.2 programs were developed before MPI-2. Portability concerns initially slowed adoption, although wider support has lessened this.
# Many MPI-1.2 applications use only a subset of that standard (16–25 functions) with no real need for MPI-2 functionality.
 
==Future==
{{FOLDOC}}
Some aspects of the MPI's future appear solid; others less so. The MPI Forum reconvened in 2007 to clarify some MPI-2 issues and explore developments for a possible MPI-3, which resulted in versions MPI-3.0 (September 2012)<ref>{{Cite web| title=MPI: A Message-Passing Interface Standard Version 3.0 | url=https://www.mpi-forum.org/docs/mpi-3.0/mpi30-report.pdf | archive-url=https://web.archive.org/web/20130319193248/http://www.mpi-forum.org:80/docs/mpi-3.0/mpi30-report.pdf | archive-date=2013-03-19}}</ref> and MPI-3.1 (June 2015).<ref>{{Cite web| title=MPI: A Message-Passing Interface Standard Version 3.1 | url=https://www.mpi-forum.org/docs/mpi-3.1/mpi31-report.pdf | archive-url=https://web.archive.org/web/20150706095015/http://www.mpi-forum.org/docs/mpi-3.1/mpi31-report.pdf | archive-date=2015-07-06}}</ref> The development continued with the approval of MPI-4.0 on June 9, 2021,<ref>{{Cite web| title=MPI: A Message-Passing Interface Standard Version 4.0 | date=2021-06-09 | url=https://www.mpi-forum.org/docs/mpi-4.0/mpi40-report.pdf | archive-url=https://web.archive.org/web/20210628174829/https://www.mpi-forum.org/docs/mpi-4.0/mpi40-report.pdf | archive-date=2021-06-28}}</ref>. MPI-4.1 was approved on November 2, 2023.<ref>{{Cite web| title=MPI: A Message-Passing Interface Standard Version 4.1 | date=2023-11-02 | url=https://www.mpi-forum.org/docs/mpi-4.1/mpi41-report.pdf | archive-url=https://web.archive.org/web/20231115151248/https://www.mpi-forum.org/docs/mpi-4.1/mpi41-report.pdf | archive-date=2023-11-15}}</ref>, with MPI-5.0 being approved on June 5, 2025, bringing significant new functionality; notably the addition of a standard [[Application Binary Interface]] (ABI).<ref>https://www.mpi-forum.org/docs/mpi-5.0/mpi50-report.pdf {{Bare URL PDF|date=July 2025}}</ref>.
 
Architectures are changing, with greater internal concurrency ([[Multi-core processor|multi-core]]), better fine-grained concurrency control (threading, affinity), and more levels of [[memory hierarchy]]. [[Multithreading (computer architecture)|Multithreaded]] programs can take advantage of these developments more easily than single-threaded applications. This has already yielded separate, complementary standards for [[symmetric multiprocessing]], namely [[OpenMP]]. MPI-2 defines how standard-conforming implementations should deal with multithreaded issues, but does not require that implementations be multithreaded, or even thread-safe. MPI-3 adds the ability to use shared-memory parallelism within a node. Implementations of MPI such as Adaptive MPI, Hybrid MPI, Fine-Grained MPI, MPC and others offer extensions to the MPI standard that address different challenges in MPI.
 
Astrophysicist Jonathan Dursi wrote an opinion piece calling MPI obsolescent, pointing to newer technologies like the [[Chapel (programming language)|Chapel]] language, [[Unified Parallel C]], [[Apache Hadoop|Hadoop]], [[Apache Spark|Spark]] and [[Apache Flink|Flink]].<ref>{{cite web|url=https://www.dursi.ca/post/hpc-is-dying-and-mpi-is-killing-it|title=HPC is dying, and MPI is killing it|website=www.dursi.ca}}</ref> At the same time, nearly all of the projects in the [[exascale computing|Exascale Computing Project]] build explicitly on MPI; MPI has been shown to scale to the largest machines as of the early 2020s and is widely considered to stay relevant for a long time to come.
{{Wikibooks | Programming:Message-Passing Interface }}
 
==See also==
[[Category:Parallel computing]]
{{Div col|colwidth=25em}}
* [[Actor model]]
* [[Bulk synchronous parallel]]
* [[Caltech Cosmic Cube]]
* [[Charm++]]
* [[Co-array Fortran]]
* [[Global Arrays]]
* [[Microsoft Messaging Passing Interface]]
* [[MVAPICH]]
* [[OpenHMPP]]
* [[Parallel Virtual Machine]] (PVM)
* [[Partitioned global address space]]
* [[Unified Parallel C]]
* [[X10 (programming language)]]
{{div col end}}
 
==References==
[[de:Message Passing Interface]]
{{Reflist}}
[[es:Interfaz de Paso de Mensajes]]
 
[[fr:Message Passing Interface]]
==Further reading==
[[it:Message Passing Interface]]
{{Div col|colwidth=30em}}
[[ja:Message Passing Interface]]
* {{FOLDOC|Message+Passing+Interface|Message Passing Interface}}
[[pl:MPI]]
* Aoyama, Yukiya; Nakano, Jun (1999) ''[https://web.archive.org/web/20080119023608/http://www.redbooks.ibm.com/abstracts/sg245380.html RS/6000 SP: Practical MPI Programming]'', ITSO
[[pt:Message Passing Interface]]
* Foster, Ian (1995) ''Designing and Building Parallel Programs (Online)'' Addison-Wesley {{ISBN|0-201-57594-9}}, chapter 8 ''[http://www-unix.mcs.anl.gov/dbpp/text/node94.html#SECTION03500000000000000000 Message Passing Interface]''
[[ru:Message Passing Interface]]
* Wijesuriya, Viraj Brian (2010-12-29) [http://www.daniweb.com/forums/post1428830.html#post1428830 ''Daniweb: Sample Code for Matrix Multiplication using MPI Parallel Programming Approach'']
[[vi:MPI]]
* ''Using MPI'' series:
[[tr:MPI]]
** {{cite book |last1=Gropp |first1=William |last2=Lusk |first2=Ewing |last3=Skjellum |first3=Anthony |year=1994 |url=https://archive.org/details/usingmpiportable00grop |title=Using MPI: portable parallel programming with the message-passing interface |publisher=[[MIT Press]] Scientific And Engineering Computation Series |___location=Cambridge, MA, USA |isbn=978-0-262-57104-3 }}
** {{cite book
|last1=Gropp |first1=William |last2=Lusk |first2=Ewing |last3=Skjellum |first3=Anthony
|year=1999a
|url=http://mitpress.mit.edu/books/using-mpi-second-edition
|title=Using MPI, 2nd Edition: Portable Parallel Programming with the Message Passing Interface
|publisher=[[MIT Press]] Scientific And Engineering Computation Series |___location=Cambridge, MA, USA |isbn=978-0-262-57132-6 }}
** {{cite book
|last1=Gropp |first1=William |last2=Lusk |first2=Ewing |last3=Skjellum |first3=Anthony
|year=1999b
|url=http://mitpress.mit.edu/books/using-mpi-2
|title=Using MPI-2: Advanced Features of the Message Passing Interface
|publisher=[[MIT Press]] |isbn=978-0-262-57133-3 }}
** {{cite book
|last1=Gropp |first1=William |last2=Lusk |first2=Ewing |last3=Skjellum |first3=Anthony
|year=2014
|url=http://mitpress.mit.edu/books/using-mpi-third-edition
|title=Using MPI, 3rd edition: Portable Parallel Programming with the Message-Passing Interface
|publisher=[[MIT Press]] Scientific And Engineering Computation Series |___location=Cambridge, MA, USA |isbn=978-0-262-52739-2 }}
* {{Cite journal
|last1=Gropp |first1=William |last2=Lusk |first2=Ewing |last3=Skjellum |first3=Anthony
|year=1996
|citeseerx = 10.1.1.102.9485
|title=A High-Performance, Portable Implementation of the MPI Message Passing Interface
|journal=Parallel Computing |volume=22 |issue=6 |pages=789–828 |doi=10.1016/0167-8191(96)00024-5 }}
* Pacheco, Peter S. (1997) ''[https://books.google.com/books?&id=tCVkM1z2aOoC Parallel Programming with MPI]''.[http://www.cs.usfca.edu/mpi/ Parallel Programming with MPI] 500 pp. Morgan Kaufmann {{ISBN|1-55860-339-5}}.
* ''MPI—The Complete Reference'' series:
** Snir, Marc; Otto, Steve W.; Huss-Lederman, Steven; Walker, David W.; Dongarra, Jack J. (1995) ''[http://www.netlib.org/utk/papers/mpi-book/mpi-book.html MPI: The Complete Reference]''. MIT Press Cambridge, MA, USA. {{ISBN|0-262-69215-5}}
** Snir, Marc; Otto, Steve W.; Huss-Lederman, Steven; Walker, David W.; Dongarra, Jack J. (1998) ''MPI—The Complete Reference: Volume 1, The MPI Core''. MIT Press, Cambridge, MA. {{ISBN|0-262-69215-5}}
** Gropp, William; Huss-Lederman, Steven; Lumsdaine, Andrew; Lusk, Ewing; Nitzberg, Bill; Saphir, William; and Snir, Marc (1998) ''[https://web.archive.org/web/20010803093058/http://mitpress.mit.edu/book-home.tcl?isbn=0262571234 MPI—The Complete Reference: Volume 2, The MPI-2 Extensions]''. MIT Press, Cambridge, MA {{ISBN|978-0-262-57123-4}}
* Firuziaan, Mohammad; Nommensen, O. (2002) ''Parallel Processing via MPI & OpenMP'', Linux Enterprise, 10/2002
* Vanneschi, Marco (1999) ''Parallel paradigms for scientific computing'' In Proceedings of the European School on Computational Chemistry (1999, Perugia, Italy), number 75 in ''[https://books.google.com/books?&id=zMqVdFgVnrgC Lecture Notes in Chemistry]'', pages 170–183. Springer, 2000
* Bala, Bruck, Cypher, Elustondo, A Ho, CT Ho, Kipnis, Snir (1995) ″[https://ieeexplore.ieee.org/abstract/document/342126/ A portable and tunable collective communication library for scalable parallel computers]" in IEEE Transactions on Parallel and Distributed Systems,″ vol. 6, no. 2, pp.&nbsp;154–164, Feb 1995.
{{div col end}}
 
==External links==
{{Wikibooks|Message-Passing Interface}}
* {{Official website|https://www.mpi-forum.org/}}
*[https://www.mpi-forum.org/docs/mpi-5.0/mpi50-report.pdf Official MPI-5.0 standard] ([https://www.mpi-forum.org/docs/mpi-5.0/mpi50-report/mpi50-report.htm unofficial HTML version])
* [http://polaris.cs.uiuc.edu/~padua/cs320/mpi/tutorial.pdf Tutorial on MPI: The Message-Passing Interface]
* [http://moss.csc.ncsu.edu/~mueller/cluster/mpi.guide.pdf A User's Guide to MPI]
* [https://www.citutor.org/bounce.php?course=21 Tutorial: Introduction to MPI (self-paced, includes self-tests and exercises)]
{{Parallel computing|state=collapsed}}
 
[[Category:Application programming interfaces]]
[[Category:Parallel computing]]
[[Category:Articles with example C code]]