Expand all answers
Collapse all answers
./configure --prefix=/home/me/mpich make make installwhere the value of the --prefix argument to configure is the directory in which MPICH should be installed. See the Installation Guide for more detailed instructions.
mpirun -np 1 a.out -mpiversion
All of the guides are available at
http://www.mcs.anl.gov/mpi/mpich1/docs.html .
Answer:
If you see something like this
% mpirun -np 2 cpi
Permission denied.
(or
connection reset by peer or
poll: protocol failure in circuit setup)
when using the ch_p4 device, it probably means that
you do not have permission to use rsh to start processes. The script
tstmachines can be used to test this. For example, if the architecture
type (the -arch argument to configure) is sun4, then try
tstmachines sun4
If this fails, then you may need a .rhosts or \file{/etc/hosts.equiv}
file (you may need to see your system administrator) or you may need to use
the p4 server.
Another possible problem is the choice of the remote shell program; some
systems have several. Check with your systems administrator about which
version of rsh or remsh you should be using.
If your system allows a .rhosts file, do the following:
host usernameFor example, if your username is doe and you want to user machines a.our.org and b.our.org, your .rhosts file should contain
a.our.org doe b.our.org doeNote the use of fully qualified host names (some systems require this).
On networks where the use of .rhosts files is not allowed, (such as the one in MCS at Argonne), you should use the p4 server to run on machines that are not trusted by the machine that you are initiating the job from.
Finally, you may need to use a non-standard rsh command within MPICH. MPICH must be reconfigured with -rsh=command_name, and perhaps also with -rshnol if the remote shell command does not support the -l argument. Systems using Kerberos and/or AFS may need this.
# default: on
# description: The rshd server is the server for the rcmd(3) routine and, \
# consequently, for the rsh(1) program. The server provides \
# remote execution facilities with authentication based on \
# privileged port numbers from trusted hosts.
service shell
{
socket_type = stream
wait = no
user = root
log_on_success += USERID
log_on_failure += USERID
server = /usr/sbin/in.rshd
disable = yes
}
You must enable the service by changing "disable = yes" to "disable = no".
The same must be done to the rlogin config file to enable that service.
At this point the xinetd daemon must be restarted to register these
changes:
/etc/rc.d/init.d/xinetd restartAt this point you should receive a "Permission denied." if you attempt a command such as "rsh localhost hostname" as a non-root user (or as root for that matter). To allow users to rsh without passwords you need to edit /etc/hosts.equiv, the system-wide host file for rsh and rlogin. This file should hold hostnames of machines that you would like users to be able to start MPICH processes from. For example, simply adding:
localhost.localdomainShould allow users to perform the command "rsh localhost hostname" successfully. Likewise adding other hostnames will allow users on those hosts to rsh to this host. However, there is another catch! By default (with medium security) packet filtering is enabled as well, and this will prevent users from remote hosts from connecting to this machine using the rsh or rlogin services. This packet filter, or firewall, is administered using the ipchains package (which is installed by default). The firewall configuration is written out by a program called lokkit at installation time (I think). The configuration is stored in /etc/sysconfig/ipchains and by default looks like this:
# Firewall configuration written by lokkit # Manual customization of this file is not recommended. # Note: ifup-post will punch the current nameservers through the # firewall; such entries will *not* be listed here. :input ACCEPT :forward ACCEPT :output ACCEPT -A input -s 0/0 67:68 -d 0/0 67:68 -p udp -i eth0 -j ACCEPT -A input -s 0/0 67:68 -d 0/0 67:68 -p udp -i eth1 -j ACCEPT -A input -s 0/0 -d 0/0 -i lo -j ACCEPT -A input -p tcp -s 0/0 -d 0/0 0:1023 -y -j REJECT -A input -p tcp -s 0/0 -d 0/0 2049 -y -j REJECT -A input -p udp -s 0/0 -d 0/0 0:1023 -j REJECT -A input -p udp -s 0/0 -d 0/0 2049 -j REJECT -A input -p tcp -s 0/0 -d 0/0 6000:6009 -y -j REJECT -A input -p tcp -s 0/0 -d 0/0 7100 -y -j REJECTWhile an in-depth discussion of ipchains rules is outside the context of this document, it's worth talking about how this works a bit. First, the rules are applied in order from top of the list to the bottom of the list. The argument to -j says what to do if a packet matches; it's usually either ACCEPT (let the packet in), or REJECT (toss it out). If a packet makes it through the entire list then the default policy is applied. In this case the default policy is ACCEPT. The following line tells the packet filter to allow all localhost (-i lo) traffic to pass unmolested:
-A input -s 0/0 -d 0/0 -i lo -j ACCEPTThis line blocks all new TCP connections going to ports 0-1023, which is the range of most services, including rsh/rlogin:
-A input -p tcp -s 0/0 -d 0/0 0:1023 -y -j REJECTWe're going to modify this file to allow rsh and rlogin traffic.
# Firewall configuration written by lokkit # Manual customization of this file is not recommended. # Note: ifup-post will punch the current nameservers through the # firewall; such entries will *not* be listed here. :input ACCEPT :forward ACCEPT :output ACCEPT -A input -s 0/0 67:68 -d 0/0 67:68 -p udp -i eth0 -j ACCEPT -A input -s 0/0 67:68 -d 0/0 67:68 -p udp -i eth1 -j ACCEPT -A input -s 0/0 -d 0/0 -i lo -j ACCEPT # # New rules for rlogin/rsh traffic, incoming or outgoing # -A input -p tcp -s 0/0 -d 0/0 513 -b -j ACCEPT -A input -p tcp -s 0/0 -d 0/0 514 -b -j ACCEPT # # End of new rules # -A input -p tcp -s 0/0 -d 0/0 0:1023 -y -j REJECT -A input -p tcp -s 0/0 -d 0/0 2049 -y -j REJECT -A input -p udp -s 0/0 -d 0/0 0:1023 -j REJECT -A input -p udp -s 0/0 -d 0/0 2049 -j REJECT -A input -p tcp -s 0/0 -d 0/0 6000:6009 -y -j REJECT -A input -p tcp -s 0/0 -d 0/0 7100 -y -j REJECTAt this point users on remote systems with accounts on this system should be able to rsh/rlogin to this machine without using a password.
/etc/rc.d/init.d/sshd startThe service will be automatically started on reboot. At this point ssh on the localhost should work, although a password will still be required. However, our firewall rules will be preventing connections from other machines. We again modify /etc/sysconfig/ipchains, this time to allow ssh traffic in and out. See the above section for a discussion of what we are doing here.
# Firewall configuration written by lokkit # Manual customization of this file is not recommended. # Note: ifup-post will punch the current nameservers through the # firewall; such entries will *not* be listed here. :input ACCEPT :forward ACCEPT :output ACCEPT -A input -s 0/0 67:68 -d 0/0 67:68 -p udp -i eth0 -j ACCEPT -A input -s 0/0 67:68 -d 0/0 67:68 -p udp -i eth1 -j ACCEPT -A input -s 0/0 -d 0/0 -i lo -j ACCEPT # # New rules for ssh traffic, incoming or outgoing # -A input -p tcp -s 0/0 -d 0/0 22 -b -j ACCEPT # # End of new rules # -A input -p tcp -s 0/0 -d 0/0 0:1023 -y -j REJECT -A input -p tcp -s 0/0 -d 0/0 2049 -y -j REJECT -A input -p udp -s 0/0 -d 0/0 0:1023 -j REJECT -A input -p udp -s 0/0 -d 0/0 2049 -j REJECT -A input -p tcp -s 0/0 -d 0/0 6000:6009 -y -j REJECT -A input -p tcp -s 0/0 -d 0/0 7100 -y -j REJECTAt this point users on remote systems should be able to ssh into the machine, but they will still need a password. Users should set up a private/public authentication key pair in order for ssh to operate without passwords. This process is documented in the installation guide, but a summary of the steps for RH7.2 will be included here. First run the "ssh-keygen -t rsa" application to create the private/public key pair. By default this will create the files ~/.ssh/id_rsa and ~/.ssh/id_rsa.pub. Use a password. Next place the public key (~.ssh/id_rsa.pub) in the file ~/.ssh/authorized_keys. If more than one machine is going to be used, then this key must be put in the ~/.ssh/authorized_keys file on each machine. The permissions on the .ssh directory should be set to 700; otherwise the sshd may choose to not accept the keys. This will allow you to connect using rsa keys rather than simple UNIX passwords. The next step is to enable an SSH agent so that you do not need to repeatedly type your password. The agent is started with "ssh-agent <cmd>". Typically <cmd> is $SHELL, so that your default shell is started. The agent will then handle authentication on your behalf any time you attempt to use ssh from this shell. To give the ssh-agent your password, type "ssh-add". This will query you for the passphrase that accompanies your rsa key. Once you have completed this, you will be able to ssh to other systems on which your key is authorized without typing a password.
-A input -p tcp -s 0/0 -d 0/0 6000:6009 -y -j REJECT -A input -p tcp -s 0/0 -d 0/0 7100 -y -j REJECTThe first blocks incoming TCP connections to ports 6000-6009 (often used by X), while the second blocks incoming TCP connections to port 7100 (often used by the X font server). We simply remove these rules:
# Firewall configuration written by lokkit # Manual customization of this file is not recommended. # Note: ifup-post will punch the current nameservers through the # firewall; such entries will *not* be listed here. :input ACCEPT :forward ACCEPT :output ACCEPT -A input -s 0/0 67:68 -d 0/0 67:68 -p udp -i eth0 -j ACCEPT -A input -s 0/0 67:68 -d 0/0 67:68 -p udp -i eth1 -j ACCEPT -A input -s 0/0 -d 0/0 -i lo -j ACCEPT -A input -p tcp -s 0/0 -d 0/0 0:1023 -y -j REJECT -A input -p tcp -s 0/0 -d 0/0 2049 -y -j REJECT -A input -p udp -s 0/0 -d 0/0 0:1023 -j REJECT -A input -p udp -s 0/0 -d 0/0 2049 -j REJECT # # Removed these rules to eliminate chance of MPICH comm. failure # # -A input -p tcp -s 0/0 -d 0/0 6000:6009 -y -j REJECT # -A input -p tcp -s 0/0 -d 0/0 7100 -y -j REJECT # # End of removed rules #This modification, in conjunction with one to allow process startup, should prepare your system for MPICH jobs.
To fix this, you can do one of the following:
shell stream tcp nowait root /etc/tcpd2 in.rshdto
shell stream tcp nowait.200 root /etc/tcpd2 in.rshd
Make sure that ssh is set up to not require a password. The command
ssh -n `hostname` dateshould return the date without any prompts for passwords. See the installation manual if you have problems.
Thanks to Victor Eijkhout for this information.
p4_error: interrupt SIGSEGVthe problem is probably not with MPI. Instead, check for program bugs including
"p1_13043: p4_error: OOPS: semop lock failed"To fix this, try running the script cleanipcs that is included with MPICH. You can also use the command ipcs to list the shared memory and semaphore resources that are in use on a node. This can help you track down resources that are held by a different user that are preventing your MPI program from running.
Argument #1 of `mpi_bcast' is one type at (2) but is some other type at (1)This is a strict interpretation of the Fortran standard. To fix this, you will need to tell the compiler to allow this usage. For the GNU g77 compiler, add the command-line option -Wno-globals, as in
mpif77 -Wno-globals mycode.fAn alternative is to use a Fortran 90 or Fortran 95 compiler with the MPI module instead of the mpif.h header file.
mpicc -o overtake overtake.o test.o ld: 0711-317 ERROR: Undefined symbol: MPIR_F_TRUE ld: 0711-317 ERROR: Undefined symbol: .MPIR_InitFortranDatatypes ld: 0711-317 ERROR: Undefined symbol: MPIR_F_FALSE ld: 0711-317 ERROR: Undefined symbol: .MPIR_InitFortran ld: 0711-317 ERROR: Undefined symbol: MPIR_I_DCOMPLEX ld: 0711-317 ERROR: Undefined symbol: .MPIR_Free_Fortran_dtes ld: 0711-317 ERROR: Undefined symbol: .MPIR_Free_Fortran_keyvalsthis usually indicates an error in the make process. For some reason, the Fortran part (which is where these symbols come from) is particularly fragile. To fix this, try the following steps:
cd src/fortran/src
make clean
make
ar ../../../lib/libmpich.a *.o
ranlib ../../../lib/libmpich.a
If weak symbols are not supported, then in addition, do these
additional steps:
make clean
make profile
ar ../../../lib/libpmpich.a *.o
ranlib ../../../lib/libpmpich.a
The problem is that some versions of make have logic errors that cause them to
create files but not to act on them; this causes make to build the object file
but then fail to include it in the archive. The above steps should work
around this problem.
"commreq_free.c", line 70: warning #187: use of "=" where "==" may
have been intended
There is nothing wrong with these statements. The compiler is warning
about a legal, but often misused, feature of the C language. The statements
have been crafted so that most compilers recognize that the "=" was used
intentionally; unfortunately, some compilers insists on warning about this
valid use of C and provide no way to indicate to the compiler that the warning
is unnecessary.
No. In principle, MPICH could use multicast, but in practice this would be very difficult. To start with, IP multicast is unreliable; additional code to make it reliable needs to be added. In fact, there is an effort to provide a reliable multicast, built ontop of the unreliable multicast. The second problem is that not all systems allow user programs (or any program) to perform an IP multicast. In fact, that is the case for the systems that we have been developing on. Thus, we will always need the point-to-point version. There is a fairly easy way to replace any collective routine in MPI, but no-one has offered us a multicast-based MPI_Bcast yet...