MPICH.NT FAQ
Internet Explorer users can click to expand and double-click to contract the answers.
To use the default launcher, mpd, the following must be true:
If the top two are not true then you cannot use MPICH. If you do not have administrator privileges you can use mpd from the command line without installing it. Download the source distribution and read the manual for instructions how to use mpd from a command prompt.
No. The unix tcp device uses p4 code while the Windows device uses Windows specific code and they are not compatible. Even if you had a launcher that could start processes on Linux and Windows nodes, the Windows device cannot make socket connections to the p4 device on the linux nodes and vice versa.
No. The libraries and project files provided were all created with VC 6 and Visual Fortran 6. If you want to use an older version of the compiler you will have to re-compile the MPICH source. This may require editing the project files by hand so they can be read by an older version of Visual Studio.
In a limited way yes. The TCP/IP device for Windows has code that only runs on WindowsNT/2000/XP, but you can use the -localonly option to mpirun on a Win9x machine. This means you can run multiple processes on a single Win9x machine but you cannot run applications across multiple Win9x machines. This capability is provided so you can compile and test programs on a single Win9x machine and then run the code on an NT cluster at some other time. To install on a Win9x machine, download the source distribution, unzip the contents, use mpirun from the bin directory and make sure the dlls in the lib directory are in your path. Help files are in the www directory www\index.html.
The cygwin environment has problems with the Windows API function CreateProcess. A workaround was introduced in mpich.nt.1.2.2 Oct 10, 2001. This and more recent versions of mpirun function in a bash shell running in a command prompt. MPIRun does not work in the XFree86 windowing environment.
1) Put printf's and fflush(stdout)'s in you program.
2) The help pages describe how to launch an application by hand without using mpirun. It involves setting environment variables by hand in a command prompt and then executing the application. You can use this method to debug an application. First, bring up two command prompts and set the environment variables so that you can run a two process job. Then instead of running the application, execute "msdev myapp.exe". This will bring up the developer studio and then you can step through the code using the debugger. See the help pages for the specific environment variables to set.
This error usually occurs when you try to launch an executable from a shared directory on WindowsNT Workstation, Windows 2000 Professional, or WindowsXP Professional. The professional versions of Windows as apposed to the server editions have limitations on the file sharing capabilities. Place the executable on a network share on a server machine or copy the executable to the local drive of each machine.
1) The process launcher for MPICH, mpd, runs as a service. When it
launches processes they are put in their own hidden desktop. Any windows these
processes bring up are hidden from view. If you must be able to see your
windows, you can allow processes to share the
default desktop by re-installing mpd with the interact option. Execute "mpd
-remove" to uninstall and then execute "mpd -install -interact" to re-install.
This will not work for a terminal services session. This will only allow windows
to show up on the default logon desktop (the monitor directly connected to the
host). Also, there may be permission issues if a
user is logged on to a machine and a different user attempts to launch a process
on the same machine. So this is not the default nor recommended method of
installation.
2) But sometimes I can see my windows, even with the default installation.
This is true. If mpirun determines that you are only running processes on the local
machine, it bypasses mpd and launches the processes in the current context -
thus allowing you to see your windows. When mpirun parses a configuration file,
it always use mpd. guiMPIRun always uses mpd.
MPIRun options must be specified before the name of the executable. Any options specified after the executable will be passed as arguments to the executable and not parsed as mpirun options. For example: "mpirun -np 5 myapp.exe -machinefile filename" will not use the machine file specified by 'filename' because mpirun considers this an argument to the application.
The TEMP value specified in MPIConfig must be a local path. You cannot use \\server\share as a temporary path. It must be something like c:\ or c:\temp
The Fortran dlls that come with the Fortran compilers are not very version friendly. Make sure you have the same version of the compiler dlls on all the machines. Or you can place the compiler dlls in the same directory as your executable. Windows loads dlls from the executable directory first before searching the path so this will insure that all the processes us the same dlls.
There are several things that can cause your job to crash before it even starts. The process launcher by default prevents any popup windows from appearing when your processes crash. So if the job crashes at startup you may not know that it has even run. The two main causes are:
1) You are missing a dll required by the process. Many times you will compile a program that works on the local machine and then crashes on a remote machine because the remote machine does not have the necessary dlls. If you compiled with cygwin you need to make sure the cygwin dll is on all the nodes. If you compile with the Microsoft Visual tools you need to make sure those libraries are available. One way to solve this problem without copying files to the remote nodes is to place all necessary dlls in the same directory as the executable. Windows looks in the directory of the executable first before searching the path for dlls.
2) Your executable is broken. If you place a very large array in the global space (popular with Fortran developers) the process will crash at load time if this exceeds the process's reserved global variable space.
You must have the same account credentials on all the nodes participating in the mpich job. If your cluster is set up with a domain controller then you can use a domain account to launch an mpich job. If you do not have a domain controller then you must set up user accounts on all the nodes individually with the same credentials on each node. Each user can have whatever password they choose, but they must use the same password on all the nodes. In other words, UserA-PasswordA must be the same on all the nodes and UserB-PasswordB must be the same on all the nodes, etc.
The executable used in an mpich job must be available to all the nodes participating in the job. The path to the executable must be valid on all the nodes. This can be accomplished by copying the executable to a common location on all the nodes or copying it to a shared location. For example, you could copy cpi.exe to c:\temp\cpi.exe on all the nodes and run "mpirun -np 3 c:\temp\cpi.exe". Or you could copy cpi.exe to a shared directory \\myhost\myshare\cpi.exe and then execute "mpirun -np 3 \\myhost\myshare\cpi.exe".
MPICH uses two to four threads internally. Some users may want to increase the thread stack so they can have large static variables. This is common with many Fortran programs. The problem is that your program runs out of memory for the stack. You can't set the thread stack size greater than 1/4th the total available stack memory. The easiest solution is to leave the stack size alone and malloc your large variables.