|
|
Quick Start Guide to SUT
This guide will show you how to get started quickly using SUT. Note that
SUT is currently in Beta, so you may not want to use it in a production
environment.
Conventions
In this guide, the dollar sign ($) is used to indicate a shell prompt. This
may vary at your site.
Installing MPICH
SUT has only been tested with the MPICH implementation of MPI, so this section
will show you how to install it.
Get MPICH
Download the most recent version of MPICH from here. (See the MPICH homepage for more information.)
Unpack the distribution
$ gunzip -c mpich.tar.gz | tar xf -
Configure MPICH
$ cd mpich
$ ./configure --with-device=ch_p4mpd
The configure step may take some time. You may also want to change the
installation directory by adding a --prefix flag to the configure line. For
example, to install MPICH in /usr/local/mpich, use
$ ./configure --with-device=ch_p4mpd --prefix=/usr/local/mpich
The option --with-device=ch_p4mpd means that MPICH will build and use MPD.
Build and Install MPICH
$ make
If there were no errors in the build, you can install by typing
$ make install
You may need to become root to install MPICH, depending on the location you
specified in the configure step. Both build and installation take some time.
Distribute MPICH
At this point, you need to copy MPICH to all the nodes in your cluster. Make
sure that MPICH gets installed in the same place on all machines or else your
MPI programs (including SUT) may not work. If you are not the cluster
administrator, consult with him or her for the proper method of copying MPICH
to all of your nodes.
Installing SUT
Get SUT
Go to the download page and get the current distribution.
Unpack SUT
If you got the tar gzip distribution, use
$ gunzip -c sut-<version>.tar.gz | tar xf -
If you got the tar bzip2 distribution, use
$ bunzip2 -c sut-<version>.tar.bz2 | tar xf -
If you got the tar Z (compress) distribution, use
$ uncompress -c sut-<version>.tar.Z | tar xf -
If you got the zip distribution, use
$ unzip sut-<version>.zip
Configure SUT
$ cd sut-<version>
$ ./configure
Again, you can give a --prefix option to specify where SUT will be installed.
Build and Install SUT
$ make
If there were no errors in the build, you can install by typing
$ make install
You may need to become root to install SUT, depending on the location you
specified in the configure step.
Distribute SUT
At this point, you need to copy SUT to all the nodes in your cluster. SUT only
needs to be in the execution path on each node, but it may be easier for
administrative purposes to put in the same location on each node. If you are
not the cluster administrator, consult with him or her for the proper
method of copying SUT to all of your nodes.
Using SUT
Before you use SUT, you need to start MPD on all of the nodes that you want to
run on.
Setting up an MPD ring
Because MPD uses a ring topology, a network of MPD's is called an
MPD ring. Here we will give an example of how to set up such a
network.
Setting up .mpdpasswd
MPD uses a file in the user's home directory named '.mpdpasswd' to authenticate
connections. This file contains a random string and must match on all hosts
on which you plan to run MPD. You must create this file yourself and
distribute it to all your hosts. Make sure the file is only readable and
writable by you (or the appropriate user).
As an example, .mpdpasswd might contain the text
,@#af!#13ng,01nkav
which was generated randomly. Then the permissions are set appropriately:
chmod 600 .mpdpasswd
Start the first node
Log on to any machine that you want to be in your ring. For this example,
assume that this machine is called myhost. Start MPD in the
background by using
$ mpd &
Now run mpdtrace to see the result
$ mpdtrace
mpdtrace: myhost_4075: lhs=myhost_4075 rhs=myhost_4075 rhs2=myhost_4075
From this output, you can see that myhost is in the ring, running on
port 4075. What this output is showing is that there is now an mpd ring up
and running with 1 node.
Starting MPD on the remaining nodes
Now that you have one node running, you can start the entire ring by running
MPD on all the other nodes
$ mpd -h myhost -p 4075 &
Once you have run this command on all the nodes that you want in your ring, you
can see what the ring looks like by running mpdtrace on any node in the ring
$ mpdtrace
mpdtrace: myhost_4075: lhs=host2_2988 rhs=host3_1628 rhs2=host2_2988
mpdtrace: host2_2988: lhs=host3_1628 rhs=myhost_4075 rhs2=host3_1628
mpdtrace: host3_1628: lhs=myhost_4075 rhs=host2_2988 rhs2=myhost_4075
In this example, there is a ring of 3 hosts, myhost, host2,
and host3.
For more information about MPD, check the
MPICH User's Guide.
Examples of using the actual tools
Now that everything is installed and you have an MPD ring, you can begin to
use SUT. This section gives a few examples to get you started.
Example 1: Listing the directory
To list the current directory on all the nodes in your MPD ring, use
$ ptls -all
myfile1
myfile2
myfile1
Looking at the output above, you may be confused. By default, ptls simply lists
the files in the directories on all the nodes you specified. This can be useful
for some applications, but in many cases, you would like to see what nodes
the files are on. To do this, use the -h option
$ ptls -all -h
[host2.domain.tld]
myfile1
myfile2
[host3.domain.tld]
myfile1
Here, the header lists to which node each following file belongs.
The -C option is also useful for getting columnar output from ptls
$ ptls -all -Ch
[host2.domain.tld]
myfile1 myfile2
[host3.domain.tld]
myfile1
A Note on Nodes
In the preceeding example, the first option given was -all. This means that
ptls should run on all of the nodes in the MPD ring. The -all option is a
useful shorthand, but often you would like to run a command on only a subset
of the hosts in your ring.
For this, the -m and -M options are useful. These two options are basically
the same, except that -m gets the list of nodes from a file while -M gets the
list of nodes from the next argument.
Basic node specification is fairly easy and obvious: simply list the names of
the nodes on which you wish to run separated by white space. For example,
the command
$ ptls -M "myhost host2"
is valid.
For a few nodes, this verbose listing of the node names is acceptable, but when
you start using the commands on a large number of machines, it becomes unwieldy.
Thus, SUT offers an abbreviation syntax for nodes. Suppose that you wanted to
run ptls on hosts host1 through host30 and host52 in
your large cluster. You can do this by using
$ ptls -M "host%d@1-30,52"
The host specification here is broken into two parts: the part before the '@'
symbol and the part after it. The part before the '@' is the format.
This specifies how the node names look. In this example, the node names are
of the form host<number>. The %d in the format is where
the numbers belong in the node name. (The format is similar to those specified
for the printf C function.) The part after the '@' is the list of
numbers that belong in the format given before the '@'. This list can consist
of single numbers, such as 52 in this example, or ranges of numbers, such as
1-30 in this example.
Note that in example 1, when running ptls with the -all option, the command only
ran on host2 and host3. This is because -all actually means
"run on all nodes except the current one." (Example 1 was assumed to be run on
myhost). This makes sense in many cases, as you will see in the ptcp
example below. In order to run on the current host, it must be specified
explicitly in a node list.
NOTE: The -all, -m <machine file>, or
-M <machine list> must be the first option given to any of the parallel
SUT commands.
|
Example 2: Copying files
To copy the file 'BIGFILE' to all nodes (except the current one), use
$ ptcp -all BIGFILE .
This will copy 'BIGFILE' to the current directory on all of the nodes in the
MPD ring.
Recursive copying of directories is also possible, just as with the normal cp
$ ptcp -all -r mydir/ .
A Note on the Current Working Directory
In the examples above, the 'current directory' was mentioned, but not explained.
When you run a command on one node, your current working directory on that node
is considered to be the same across all nodes on which you are running the
command. If the directory you are in on node does not exist on another, the
current working directory on that node is considered to be your home directory.
NOTE: Pay very close attention to what your current working
directory on all nodes is whenever using SUT commands. You can get unexpected
results if you are not careful, including data loss. This is especially true
when using potentially destructive commands such as ptrm or even ptcp!
|
Contact Emil Ong about issues concerning this page.
|