MPICH2 Frequently Asked Questions
If you do not see the descriptions, click on the item to get
a description of the topic. Double-click to hide the description.
These two items may be used
to expand and collapse all entries:
Press to Expand all answers
Press to Collapse all answers
General Information
Building MPICH2
Windows version of MPICH2
Compiling MPI Programs
Running MPI Programs
MPICH2 is a freely available, portable implementation of
MPI, the Standard for
message-passing libraries. It implements both MPI-1 and MPI-2.
A: MPI stands for Message Passing Interface.
The CH comes from Chameleon, the portability layer used in the original
MPICH to provide portability to the existing message-passing systems.
A: There are two common ways to use MPI with multicore processors or
multiprocessor nodes:
Use one MPI process per core (here, a core is defined as a program
counter and some set of arithmetic, logic, and load/store units).
Use one MPI process per node (here, a node is defined as a collection
of cores that share a single address space). Use threads or
compiler-provided parallelism to exploit the multiple cores. OpenMP
may be used with MPI; the loop-level parallelism of OpenMP may be used
with any implementation of MPI (you do not need an MPI that supports
MPI_THREAD_MULTIPLE when threads are used only for computational
tasks). This is sometimes called the hybrid programming model.
MPD is the default process manager for MPICH2 on Unix platforms. It is written in
Python. SMPD is the primary process manager for MPICH2 on Windows. It
is also used for running on a combination of Windows and Linux
machines. It is written in C.
No, in many cases you can build MPICH2 using one set of compilers and then
use the libraries (and compilation scripts) with other compilers. However,
this depends on the compilers producing
compatible object files.
Specifically, the compilers must
The above may seem like a stringent set of requirements, but in practice, many
systems and compiler sets meet these needs, if for no other reason than that
any software built with multiple libraries will have requirements similar to
those of MPICH2 for compatibility.
If your compilers are completely compatible, down to the runtime libraries,
you may use the compilation scripts (mpicc etc.) by either
specifying the compiler on the command line, e.g.
mpicc -cc=icc -c foo.c
or with the environment variables
MPICH_CC etc. (this example assume
a c-shell syntax):
setenv MPICH_CC icc
mpicc -c foo.c
If the compiler is compatible
except for the runtime libraries, then
this same format works as long as a configuration file that describes the
necessary runtime libraries is created and placed into the appropriate
directory (the ``
sysconfdir'' directory in configure terms). See the
installation manual for more details.
In some cases, MPICH2 is able to build the Fortran interfaces in a way that
supports multiple mappings of names from the Fortran source code to the
object file. This is done by using the ``multiple weak symbol'' support in
some environments. For example, when using gcc under Linux, this is
the default.
A: You have several options. One is to use the Fortran 90 compiler for both
F77 and F90. Another (if you do not need Fortran 90) is to use
--disable-f90 when configuring. The options with which we test
MPICH2 and the Absoft compilers are the following:
setenv FFLAGS "-f -B108"
setenv F90FLAGS "-YALL_NAMES=LCS -B108"
setenv F77 f77
setenf F90 f90
A: FD_ZERO is part of the support for the select calls (see ``man
select'' or ``man 2 select'' on Linux and many other Unix systems) . What this
means is that your system (probably a Mac) has a broken version of the
select call and related data types. This is an OS bug; the only repair is to
update the OS to get past this bug. This test was added specifically to
detect this error; if there was an easy way to work around it, we would have
included it (we don't just implement FD_ZERO ourselves because we
don't know what else is broken in this implementation of select).
If this configure works with gcc but not with xlc, then the problem is with
the include files that xlc is using; since this is an OS call (even if
emulated), all compilers should be using consistent if not identical include
files. In this case, you may need to update xlc.
A: The g95 compiler incorrectly defines the default Fortran integer as a
64-bit integer while defining Fortran reals as 32-bit values (the Fortran
standard requires that INTEGER and REAL be the same size). This was
apparently done to allow a Fortran INTEGER to hold the value of a pointer,
rather than requiring the programmer to select an INTEGER of a suitable KIND.
To force the g95 compiler to correctly implement the Fortran standard, use the
-i4 flag. For example, set the environment variable
F90FLAGS before configuring MPICH2:
setenv F90FLAGS "-i4"
G95 users should note that there (at this writing) are two distributions of
g95 for 64-bit Linux platforms. One uses 32-bit integers and reals (and
conforms to the Fortran standard) and one uses 32-bit integers and 64-bit
reals. We recommend using the one that conforms to the standard (note that
the standard specifies the
ratio of sizes, not the absolute sizes, so a
Fortran 95 compiler that used 64 bits for
both INTEGER and REAL would
also conform to the Fortran standard. However, such a compiler would need to
use 128 bits for DOUBLE PRECISION quantities).
Check if you have set the envirnoment variable CPPFLAGS. If so, unset
it and use CXXFLAGS instead. Then rerun configure and make.
The ssm and sshm channels do not work on all platforms because they
use special interprocess locks (often assembly) that may not work
with some compilers or machine architectures. They work on Linux with
gcc, Intel, and Pathscale compilers on various Intel
architectures. They also work in Windows and Solaris environments.
Check the output of the configure step. If configure claims that ifort is a
cross compiler, the likely problem is that programs compiled and linked with
ifort cannot be run because of a missing shared library. Try to compile and
run the following program (named conftest.f90):
program conftest
integer, dimension(10) :: n
end
If this program fails to run, then the problem is that your installation of
ifort either has an error or you need to add additional values to your
environment variables (such as
LD_LIBRARY_PATH). Check your
installation documentation for the ifort compiler.
See
http://softwareforums.intel.com/ISN/Community/en-US/search/SearchResults.aspx?q=libimf.so
for an example of problems of this kind that users are having with version 9
of ifort.
If you do not need Fortran 90, you can configure with
--disable-f90.
Parallel make (often invoked with make -j4) will cause several
job steps in the build process to update the same library file
(libmpich.a) concurrently. Unfortunately, neither the
ar nor the ranlib programs correctly handle
this case, and the result is a corrupted library. For now, the
solution is to not use a parallel make when building MPICH2.
Some users may get error messages such as
SEEK_SET is #defined but must not be for the C++ binding of MPI
The problem is that both
stdio.h and the MPI C++ interface use
SEEK_SET,
SEEK_CUR, and
SEEK_END. This is really a
bug in the MPI-2 standard. You can try adding
#undef SEEK_SET
#undef SEEK_END
#undef SEEK_CUR
before
mpi.h is included, or add the definition
-DMPICH_IGNORE_CXX_SEEK
to the command line (this will cause the MPI versions of
SEEK_SET
etc. to be skipped).
Some users, particularly with older C++ compilers, may see error messages
of the form
"error C2555: 'MPI::Nullcomm::Clone' : overriding virtual function differs from
'MPI::Comm::Clone' only by return type or calling convention".
This is caused by the compiler not implementing part of the C++ standard.
To work around this problem, add the definition
-DHAVE_NO_VARIABLE_RETURN_TYPE_SUPPORT
to the
CXXFLAGS variable or add a
#define HAVE_NO_VARIABLE_RETURN_TYPE_SUPPORT 1
before including
mpi.h.
A: The specific method depends on the process manager and version of
mpiexec that you are using. See the appropriate specific section.
A: By default, all the environment variables in the shell where
mpiexec is run are passed to all processes of the application
program. (The one exception is
LD_LIBRARY_PATH when the
mpd's are being run as root.) This default can be overridden in many
ways, and individual environment variables can be passed to specific
processes using arguments to
mpiexec. A synopsis of the
possible arguments can be listed by typing
mpiexec -help
and further details are available in the
Users Guide.
A: Where processes run, whether by default or by specifying them
yourself, depends on the process manager being used.
If you are using the
gforker process manager, then all MPI
processes run on the same host where you are running
mpiexec.
If you are using the
mpd process manager, which is the default,
then many options are available. If you are using
mpd, then
before you run
mpiexec, you will have started, or will have had
started for you, a ring of processes called
mpd's
(multi-purpose daemons), each running on its own host. It is likely,
but not necessary, that each
mpd will be running on a separate
host. You can find out what this ring of hosts consists of by running
the program
mpdtrace. One of the
mpd's will be
running on the ``local'' machine, the one where you will run
mpiexec. The default placement of MPI processes, if one runs
mpiexec -n 10 a.out
is to start the first MPI process (rank 0) on the local machine and then
to distribute the rest around the
mpd ring one at a time. If
there are more processes than
mpd's, then wraparound occurs.
If there are more
mpd's than MPI processes, then some
mpd's will not run MPI processes. Thus any number of processes
can be run on a ring of any size. While one is doing development, it is
handy to run only one
mpd, on the local machine. Then all the
MPI processes will run locally as well.
The first modification to this default behavior is the
-1
option to
mpiexec (not a great argument name). If
-1
is specified, as in
mpiexec -1 -n 10 a.out
then the first application process will be started by the first
mpd in the ring
after the local host. (If there is only
one
mpd in the ring, then this will be on the local host.)
This option is for use when a cluster of compute nodes has a ``head
node'' where commands like
mpiexec are run but not application
processes.
If an
mpd is started with the
--ncpus option, then
when it is its turn to start a process, it will start several
application processes rather than just one before handing off the task
of starting more processes to the next
mpd in the ring. For
example, if the
mpd is started with
mpd --ncpus=4
then it will start as many as four application processes, with
consecutive ranks, when it is its turn to start processes. This option
is for use in clusters of SMP's, when the user would like consecutive
ranks to appear on the same machine. (In the default case, the same
number of processes might well run on the machine, but their ranks would
be different.)
(A feature of the
--ncpus=[n] argument is that it has the above effect only
until all of the mpd's have started n processes at a time once;
afterwards each mpd starts one process at a time. This is in order to
balance the number of processes per machine to the extent possible.)
Other ways to control the placement of processes are by direct use of
arguments to
mpiexec. See the
Users Guide.
A: On Windows, you need to start the program with mpiexec for
any of the MPI-2 dynamic process functions to work.
A: Output to
stdout and
stderr may not be written
from your process immediately after a
printf or
fprintf (or
PRINT in Fortran) because, under Unix, such output is
buffered unless the program believes that the output is to a
terminal. When the program is run by
mpiexec, the C standard I/O
library (and normally the Fortran runtime library) will buffer the
output. For C programmers, you can either use a call
fflush(stdout)
to force the output to be written or you can set no buffering by
calling
#include <stdio.h>
setvbuf( stdout, NULL, _IONBF, 0 );
on each file descriptor (
stdout in this example) which you
want to send the output immedately to your terminal or file.
There is no standard way to either change the buffering mode or to
flush the output in Fortran. However, many Fortrans include an
extension to provide this function. For example, in
g77,
call flush()
can be used. The
xlf compiler supports
call flush_(6)
where the argument is the Fortran logical unit number (here
6, which is often the unit number associated with
PRINT).
With the G95 Fortran 95 compiler, set the environment variable
G95_UNBUFFERED_6 to cause output to unit 6 to be unbuffered.
A: By default, g95 does not flush output to stdout. This also appears
to cause problems for standard input. If you are using the Fortran logical
units 5 and 6 (or the * unit) for standard input and output, set the
environment variable G95_UNBUFFERED_6 to yes.
A: To run MPI programs in the background when using MPD, you need to
redirect stdin from
/dev/null. For example,
mpiexec -n 4 a.out < /dev/null &