MPICH Frequently Asked Questions

See FAQ for a more up-to-date list.
  • Introduction
  • Installing MPICH
  • Using MPICH
  • Permission Denied
  • poll: protocol failure
  • Using SSH
  • Compiler Switches
  • MPMD programs
  • Reporting problems and support
  • Algorithms used in MPICH
  • Introduction

    MPICH is a freely available, portable implementation of MPI, the Standard for message-passing libraries.

    Installing MPICH

    Building and installing MPICH often requires only
     configure --prefix=/home/me/mpich
     make
     make install
    
    where the value of the --prefix argument to configure is the directory in which MPICH should be installed. See the Installation Guide for more detailed instructions.

    Using MPICH

    Information on using MPICH can be found in the Users' Guide.

    Permission Denied

    Question:
    When I use mpirun, I get the message Permission denied, connection reset by peer, or poll: protocol failure in circuit setup when trying to run MPICH.

    Answer:
    If you see something like this

        % mpirun -np 2 cpi 
        Permission denied.
    
    (or connection reset by peer or poll: protocol failure in circuit setup) when using the ch_p4 device, it probably means that you do not have permission to use rsh to start processes. The script tstmachines can be used to test this. For example, if the architecture type (the -arch argument to configure) is sun4, then try
        tstmachines sun4
    
    If this fails, then you may need a .rhosts or /etc/hosts.equiv file (you may need to see your system administrator) or you may need to use the p4 server (see Section sec-p4-server). Another possible problem is the choice of the remote shell program; some systems have several. Check with your systems administrator about which version of rsh or remsh you should be using.

    If your system allows a .rhosts file, do the following:

  • Create a file .rhosts in your home directory
  • Change the protection on it to user read/write only: chmod og-rwx .rhosts.
  • Add one line to the .rhosts file for each processor that you want to use. The format is
  • host username
    
    For example, if your username is doe and you want to user machines a.our.org and b.our.org, your .rhosts file should contain
    a.our.org doe
    b.our.org doe
    
    Note the use of fully qualified host names (some systems require this).

    On networks where the use of .rhosts files is not allowed, (such as the one in MCS at Argonne), you should use the p4 server to run on machines that are not trusted by the machine that you are initiating the job from.

    Finally, you may need to use a non-standard rsh command within MPICH. MPICH must be reconfigured with -rsh=command_name, and perhaps also with -rshnol if the remote shell command does not support the -l argument. Systems using Kerberos and/or AFS may need this.

    poll: protocol failure during circuit creation

    You may see this message if you attempt to run too many MPI programs in a short period of time. For example, in Linux and when using the ch_p4 device (without the secure server or ssh), MPICH uses rsh to start the MPI processes. Depending on the particular Linux distribution and verison, there may be a limit of as few as 40 processes per minute. When running the MPICH test suite or starting short parallel jobs from a script, it is possible to exceed this limit.

    To fix this, you can do one of the following:

    1. Wait a few seconds between running parallel jobs. You may need to wait up to a minute.
    2. Modify /etc/inetd.conf to allow more processes per minute for rsh. For example, change
      shell	stream	tcp	nowait	root	/etc/tcpd2	in.rshd 
      
      to
      shell	stream	tcp	nowait.200	root	/etc/tcpd2	in.rshd 
      

    3. Use the ch_p4mpd device or the secure server option of the ch_p4 device instead. Neither of these relies on inetd.

    Using SSH

    The secure shell (ssh) may be used with the ch_p4 device, but requires careful setup. See configuring with ssh in the installation manual.

    Make sure that ssh is set up to not require a password. The command

     ssh -n `hostname` date
    
    should return the date without any prompts for passwords. See the installation manual if you have problems.

    Compiler Switches

    Normally, you should let configure determine compiler switches. However, you can use the configure options -cflags=... and -fflags=... to specify special flags. See also compiler switches.

    MPMD (Multiple Program Multiple Data) Programs

    MPICH, depending on the device, supports MPMD programs. However, the mpirun script currently does not support MPMD programs. For the ch_p4 device, the user must create a procgroup file and invoke the program that will have rank zero in MPI_COMM_WORLD with the command-line option -p4pg filename. See the Users Guide for more information.

    Reporting problems and support

    1. First, check the list of known bugs and patches for the problem you are seeing.
    2. Also check the troubleshooting guides in the Installation and Users guides.
    3. If that doesn't help, send mail to mpi-bugs@mcs.anl.gov.

    Algorithms used in MPICH

    1. Does MPICH use IP Multicast for MPI_Bcast?

      No. In principle, MPICH could use multicast, but in practice this would be very difficult. To start with, IP multicast is unreliable; additional code to make it reliable needs to be added. In fact, there is an effort to provide a reliable multicast, built ontop of the unreliable multicast. The second problem is that not all systems allow user programs (or *any* program) to perform an IP multicast. In fact, that is the case for the systems that we have been developing on. Thus, we will always need the point-to-point version. There is a fairly easy way to replace any collective routine in MPI, but no-one has offered us a multicast-based MPI_Bcast yet...