A quick overview of MPI's send modes
MPI has a number of different "send modes." These represent different
choices of buffering (where is the data kept until it is received)
and synchronization (when does a send complete).
In the following, I use "send buffer" for the user-provided buffer to send.
- MPI_Send
-
MPI_Send will not return until you can use the send buffer. It
may or may not block (it is allowed to buffer, either on the sender or
receiver side, or to wait for the matching receive).
- MPI_Bsend
-
May buffer; returns immediately and you can use
the send buffer. A late add-on to the MPI specification. Should be used
only when absolutely necessary.
- MPI_Ssend
- will not return until matching receive posted
- MPI_Rsend
-
May be used ONLY if matching receive already posted. User responsible
for writing a correct program.
- MPI_Isend
-
Nonblocking send. But not necessarily asynchronous. You can NOT reuse the send buffer
until either a successful, wait/test or you KNOW that the message
has been received (see MPI_Request_free).
Note also that while the I refers to immediate, there is no
performance requirement on MPI_Isend. An immediate send must return to the
user without requiring a matching receive at the destination. An
implementation is free to send the data to the destination before returning,
as long as the send call does not block waiting for a matching receive.
Different strategies of when to send the data offer different performance
advantages and disadvantages that will depend on the application.
-
MPI_Ibsend
- buffered nonblocking
- MPI_Issend
-
Synchronous nonblocking.
Note that a Wait/Test will complete only when the matching receive
is posted.
- MPI_Irsend
-
As with MPI_Rsend, but nonblocking.
Note that "nonblocking" refers ONLY to whether the data buffer is available
for reuse after the call. No part of the MPI specification, for example,
mandates concurent operation of data transfers and computation.
Some people have expressed concern
about not having a single "perfect" send routine. But note that
in general you can't write code in Fortran that will run at optimum speed on
both Vector and RICS/Cache machines without picking different code for
the different architectures. MPI at least lets you express the different
algorithms, just like C or Fortran.
Recommendations
The best performance is likely if you can write your program so that you could
use just MPI_Ssend; in that case, an MPI implementation can completely avoid
buffering data. Use MPI_Send instead; this allows the MPI implementation the
maximum flexibility in choosing how to deliver your data.
(Unfortunately, one vendor has chosen to have MPI_Send emphasize buffering
over performance; on that system, MPI_Ssend may perform better.)
If nonblocking routines are necessary, then try to use MPI_Isend or MPI_Irecv.
Use MPI_Bsend only when it is too inconvienent to use MPI_Isend.
The remaining routines, MPI_Rsend, MPI_Issend, etc., are rarely used but may
be of value in writing system-dependent message-passing code entirely within
MPI.