Message passing is a common method for writing programs for
distributed-memory parallel computers. Unfortunately, the lack of a standard
for
message passing has hampered the construction of portable and efficient
parallel programs. In an attempt to remedy this problem, a number of
groups have developed their own message-passing systems, each with its
own strengths and weaknesses. Chameleon [9] is a
second-generation
system of this type. Rather than replacing these existing systems, Chameleon
is meant
to supplement them by providing a uniform way to access many of these
systems.
Chameleon's goals are to (a) be very lightweight (low
overhead), (b) be highly portable, and (c) help standardize program
startup and the use of emerging message-passing operations such as collective
operations on subsets of processors. Chameleon also provides a way to
port programs written using PICL or Intel NX message passing to other
systems, including collections of workstations. This feature was used by the
global climate model (see Section
) to port to the SP1.
Chameleon ported to the SP1 with no problems other than the need to
statically link Fortran programs. Both an EUI and EUIH port were provided, as
well as an IP port (using p4 (Section
) for the IP
transport). The EUIH port provided a simplified startup mechanism that
eliminated the need for having the user invoke the program with the shell
script cotb0.
Chameleon includes a set of programs
that test the communications performance of the system.
The twin program (written by Scott Berryman and William Gropp) tests
communication between pairs of processors, using a number of techniques to
remove ``occasional effects'' such as timer interrupts and the effect of
network activity from the timings. The message sizes tested are also chosen
adaptively to capture discontinuities in the behavior of the message-passing
system. One such discontinuity is shown in the EUI results at 128 bytes; EUI
switches to a different protocol for longer messages that significantly adds
to the latency of longer messages. This effect is shown in Figure
.
The DELTA results also show a discontinuity (at 480 bytes); this reflects the
message packet size (minus the header) used on the DELTA.
The performance for long messages for a variety of machines is shown in
Figure
.
These results show that we can expect good performance
compared to other MPPs for communication-intensive programs that use EUIH.
The EUI data is from a preproduction version of EUI.
| 4#4 |
Additional programs test collective operations (gop), compute the relative speed of communication links (tcomm), test all of the links for correct operation (stress), and compute the bisection bandwidth (bisect). An option allows the programs to display in an X-window the amount of communication activity on a per-processor basis.