Most massively parallel processors (MPPs) provide a way to start a program on a requested number of processors; mpirun makes use of the appropriate command whenever possible. In contrast, workstation clusters require that each process in a parallel job be started individually, though programs to help start these processes exist (see Section Using the Secure Server below). Because workstation clusters are not already organized as an MPP, additional information is required to make use of them. Mpich should be installed with a list of participating workstations in the file machines.<arch> in the directory /usr/local/mpich/share. This file is used by mpirun to choose processors to run on (using heterogeneous clusters is discussed in Section The P4 Procgroup File ). The rest of this section discusses some of the details of this process, and how you can check for problems. Also see Section In Case of Trouble , particularly the discussion of common problems. Also see ``In Case of Trouble'' in the full manual, particularly the discussion of common problems.
Use the script tstmachines in /usr/local/mpich/sbin to ensure that you can use all of the machines that you have listed. This script performs an rsh and a short directory listing; this tests that you both have access to the node and that a program in the current directory is visible on the remote node. If there are any problems, they will be listed. These problems must be fixed before proceeding.
The only argument to tstmachines is the name of the architecture;
this is the same name as the extension on the machines file. For example,
/usr/local/mpich/sbin/tstmachines sun4tests that a program in the current directory can be executed by all of the machines in the sun4 machines list. This program is silent if all is well; if you want to see what it is doing, use the -v (for verbose) argument:
/usr/local/mpich/sbin/tstmachines -v sun4The output from this command might look like
Trying true on host1.uoffoo.edu ... Trying true on host2.uoffoo.edu ... Trying ls on host1.uoffoo.edu ... Trying ls on host2.uoffoo.edu ... Trying user program on host1.uoffoo.edu ... Trying user program on host2.uoffoo.edu ...If tstmachines finds a problem, it will suggest possible reasons and solutions. In brief, there are three tests:
2. Is current working directory available to all machines? This attempts to ls a file that tstmachines creates by running ls using the remote shell command. Note that ch_p4 does not require that all processors have access to the same file system (see Section The P4 Procgroup File ), but the mpirun command does require this.
3. Can user programs be run on remote systems? This checks that shared libraries and other components have been properly installed on all machines.
You can change the remote shell command that the ch_p4 device
uses to start the remote processes with the environment variable
For example, if the default remote shell
program is rsh but you wish to use the secure shell ssh,
you can do
setenv P4_RSHCOMMAND ssh mpirun -np 4 a.outThis only works for different remote shell commands that accept the same command line arguments. If you are having trouble using the remote shell commands, consider using either the secure shell or the ch_p4mpd device.
Section Configuring with ssh explains how to set up your environment so that the ch_p4 device on networks will use the secure shell ssh instead of rsh. This is useful on networks where for security reasons the use of rsh is discouraged or disallowed.
Because each workstation in a cluster (usually) requires that a new user log into it, and because this process can be very time-consuming, mpich provides a program that may be used to speed this process. This is the secure server, and is located in serv_p4 in the directory /usr/local/mpich/bin. The script chp4_servs in the same directory may be used to start serv_p4 on those workstations that you can rsh programs on. You can also start the server by hand and allow it to run in the background; this is appropriate on machines that do not accept rsh connections but on which you have accounts.
Before you start this server, check to see if the secure server has
been installed for general use; if so, the same server can be used by
everyone. In this mode, root access is required to install the server. If
the server has not been installed, then you can install it for your own use
without needing any special privileges with
chp4_servs -port=1234This starts the secure server on all of the machines listed in the file /usr/local/mpich/share/machines.<arch>.
The port number, provided with the option -port=, must be different from any other port in use on the workstations.
To make use of the secure server for the ch_p4 device, add the
following definitions to your environment:
setenv MPI_USEP4SSPORT yes setenv MPI_P4SSPORT 1234The value of MPI_P4SSPORT must be the port with which you started the secure server. When these environment variables are set, mpirun attempts to use the secure server to start programs that use the ch_p4 device. (The command line argument -p4ssport to mpirun may be used instead of these environment variables; mpirun -help will give you more information.)
When using a cluster of symmetric multiprocessors (SMPs) (with the ch_p4
device configured with -comm=shared), you can control the number of
processes that communicate with shared memory on each SMP node.
First, you need to modify the machines file (see
Section Workstation clusters and the ch_p4 device
) to indicate the number of processes that should
be started on each host. Normally this number should be no greater than the
number of processors; on SMPs with large numbers of processors, the number
should be one less than the number of processors in order to leave one
processor for the operating system. The format is simple: each line of the
machines file specifies a hostname, optionally followed by a colon (:)
and the number of processes to allow. For example, the file containing the
mercury venus earth mars:2 jupiter:15specifies three single processor machines (mercury, venus, and earth), a 2 processor machine (mars), and a 15 processor machine (jupiter).
By default, mpirun will use at most the number of processors specified in the machines list for each node, upto 16 processes on each machine. By setting the environment variable MPI_MAX_CLUSTER_SIZE to a positive integer value, mpirun will use upto that many processes, sharing memory for communication, on a host. For example, if MPI_MAX_CLUSTER_SIZE had the value 4, then mpirun -np 9 with the above machine file create one process on each of mercury, venus, and earth, 2 on mars (2 because the machines file specifies that mars may have 2 processes sharing memory) and 4 on jupiter (because jupiter may have 15 processes and only 4 more are needed). If 10 processes were needed, mpirun would start over from the beginning of the machines file, creating an additional process on mercury; the value of MPI_MAX_CLUSTER_SIZE prevents mpirun from starting a fifth process sharing memory on jupiter.
A heterogeneous network of workstations is one in which the machines
connected by the network have different architectures and/or operating
systems. For example, a network may contain 3 Sun SPARC (sun4)
workstations and 3 SGI IRIX workstations, all of which communicate via
the TCP/IP protocol. The mpirun command may be told to use all of
these by using multiple -arch and -np arguments.
For example, to run a program on 3 sun4s and 2 SGI IRIX workstations, use
mpirun -arch sun4 -np 3 -arch IRIX -np 2 program.%aThe special program name program.%a allows you to specify the different executables for the program, since a Sun executable won't run on an SGI workstation and vice versa. The %a is replaced with the architecture name; in this example, program.sun4 runs on the Suns and program.IRIX runs on the SGI IRIX workstations. You can also put the programs into different directories; for example,
mpirun -arch sun4 -np 3 -arch IRIX -np 2 /tmp/%a/programIt is important to specify the architecture with -arch before specifying the number of processors. Also, the first arch command must refer to the processor on which the job will be started. Specifically, if -nolocal is not specified, then the first -arch must refer to the processor from which mpirun is running.
For even more control over how jobs get started, we need to look at how
mpirun starts a parallel program on a workstation cluster.
Each time mpirun
runs, it constructs and uses a new file of machine names
for just that run, using the machines file as input. (The new file is
called PIyyyy, where yyyy is the process identifier.) If you
specify -keep_pg on your mpirun invocation, you can use this
information to see where mpirun ran your last few jobs. You can
construct this file yourself and specify it as an argument to mpirun.
To do this for ch_p4, use
mpirun -p4pg pgfile myprogwhere pfile is the name of the file. The file format is defined below.
This is necessary when you want closer control over the hosts you run on, or when mpirun cannot construct it automatically. Such is the case when
- You want to run different executables on different hosts (your program is not SPMD).
- You want to run on a network of shared-memory multiprocessors and need to specify the number of processes that will share memory on each machine.
<hostname> <#procs> <progname> [<login>]An example of such a file, where the command is being issued from host sun1, might be
sun1 0 /users/jones/myprog sun2 1 /users/jones/myprog sun3 1 /users/jones/myprog hp1 1 /home/mbj/myprog mbjThe above file specifies four processes, one on each of three suns and one on another workstation where the user's account name is different. Note the 0 in the first line. It is there to indicate that no other processes are to be started on host sun1 than the one started by the user by his command.
You might want to run all the processes on your own machine, as a test.
You can do this by repeating its name in the file:
sun1 0 /users/jones/myprog sun1 1 /users/jones/myprog sun1 1 /users/jones/myprogThis will run three processes on sun1, communicating via sockets.
To run on a shared-memory multiprocessor, with 10 processes, you would use
a file like:
sgimp 9 /u/me/progNote that this is for 10 processes, one of them started by the user directly, and the other nine specified in this file. This requires that mpich was configured with the option -comm=shared; see the installation manual for more information.
If you are logged into host gyrfalcon and want to start a job with
one process on gyrfalcon and three processes on alaska, where
the alaska processes communicate through shared memory, you would use
local 0 /home/jbg/main alaska 3 /afs/u/graphicsIt is not possible to provide different command line argument to different MPI processes.
In some installations, certain
hosts can be connected in multiple ways. For example, the ``normal'' Ethernet
may be supplemented by a high-speed FDDI ring. Usually, alternate host names
are used to identify the high-speed connection. All you need to do is put
these alternate names in your machines.xxxx file.
In this case, it is important not to use the form local 0 but to use
the name of the local host. For example, if hosts host1 and
host2 have ATM connected to host1-atm and host2-atm
respectively, the correct ch_p4 procgroup file to connect them
(running the program /home/me/a.out) is
host1-atm 0 /home/me/a.out host2-atm 1 /home/me/a.out
As described at the end of Section Using Shared Libraries
As described at the end of ``Using Shared Libraries'' in the full manual,
it is sometime necessary to ensure that environment variables have
been communicated to the remote machines before the program that makes
use of shared libraries starts. The various remote shell commands
(e.g., rsh and ssh) do not do this.
Fortunately, the secure server (Section Using the Secure Server
communicate the environment variables.
This server is built and installed as part of the ch_p4 device,
and can be installed on all machines in the machines file for the current
architecture (assuming that there is a working remote shell command) with
chp4_servs -port=1234The secure server propagates all environment variables to the remote process, and ensures that the environment in which that process (containing your MPI program) contains all environment variables that start with LD_ (just in case the system uses LD_SEARCH_PATH or some other name for finding shared libraries).
By default, the working directory for processes running remotely with ch_p4 device is the same as that of the executable. To specify a different working directory, use -p4wdir as follows:
mpirun -np 4 myprog -p4wdir myrundir