next up previous
Next: Parallel Research on Invariant Up: Programming Packages and Tools Previous: Fortran M

  
Chameleon

(Contributed by William Gropp)

Message passing is a common method for writing programs for distributed-memory parallel computers. Unfortunately, the lack of a standard for message passing has hampered the construction of portable and efficient parallel programs. In an attempt to remedy this problem, a number of groups have developed their own message-passing systems, each with its own strengths and weaknesses. Chameleon [9] is a second-generation system of this type. Rather than replacing these existing systems, Chameleon is meant to supplement them by providing a uniform way to access many of these systems. Chameleon's goals are to (a) be very lightweight (low overhead), (b) be highly portable, and (c) help standardize program startup and the use of emerging message-passing operations such as collective operations on subsets of processors. Chameleon also provides a way to port programs written using PICL or Intel NX message passing to other systems, including collections of workstations. This feature was used by the global climate model (see Section [*]) to port to the SP1.

Chameleon ported to the SP1 with no problems other than the need to statically link Fortran programs. Both an EUI and EUIH port were provided, as well as an IP port (using p4 (Section [*]) for the IP transport). The EUIH port provided a simplified startup mechanism that eliminated the need for having the user invoke the program with the shell script cotb0.

Chameleon includes a set of programs that test the communications performance of the system. The twin program (written by Scott Berryman and William Gropp) tests communication between pairs of processors, using a number of techniques to remove ``occasional effects'' such as timer interrupts and the effect of network activity from the timings. The message sizes tested are also chosen adaptively to capture discontinuities in the behavior of the message-passing system. One such discontinuity is shown in the EUI results at 128 bytes; EUI switches to a different protocol for longer messages that significantly adds to the latency of longer messages. This effect is shown in Figure [*]. The DELTA results also show a discontinuity (at 480 bytes); this reflects the message packet size (minus the header) used on the DELTA. The performance for long messages for a variety of machines is shown in Figure [*]. These results show that we can expect good performance compared to other MPPs for communication-intensive programs that use EUIH. The EUI data is from a preproduction version of EUI.


  
Figure: Communication performance for small messages for a variety of machines.
3#3


  
Figure: Communication performance for long messages for a variety of machines. All SP1 results use the switch. Data for the CM-5 is unavailable for messages of this length.
4#4

Additional programs test collective operations (gop), compute the relative speed of communication links (tcomm), test all of the links for correct operation (stress), and compute the bisection bandwidth (bisect). An option allows the programs to display in an X-window the amount of communication activity on a per-processor basis.


next up previous
Next: Parallel Research on Invariant Up: Programming Packages and Tools Previous: Fortran M
Karen D. Toonen
1998-11-18