A Multilevel Approach to Topology-Aware Collective Operations in Computational Grids

TitleA Multilevel Approach to Topology-Aware Collective Operations in Computational Grids
Publication TypeReport
Year of Publication2002
AuthorsKaronis, NT, de Supinski, BR, Foster, IT, Gropp, WD, Lusk, EL, Lacour, S
Date Published04/2002
Other NumbersANL/MCS-P948-0402

The efficient implementation of collective communication operations has received much attention. Initial efforts produced \"optimal\'\' trees based on network communication models that assumed equal point-to-point latencies between any two processes. This assumption is violated in most practical settings, however, particularly in heterogeneous systems such as clusters of SMPs and wide-area \"computational Grids,\'\' with the result that collective operations perform suboptimally. In response, more recent work has focused on creating topology-aware trees for collective operations that minimize communication across slower channels (e.g., a wide-area network). While these efforts have significant communication benefits, they all limit their view of the network to only two layers. We present a strategy based upon a multilayer view of the network. By creating multilevel topology-aware} trees we take advantage of communication cost differences at every level in the network. We used this strategy to implement topology-aware versions of several MPI collective operations in MPICH-G2, the Globus Toolkit�-enabled version of the
popular MPICH implementation of the MPI standard. Using information about topology provided by MPICH-G2, we construct these multilevel topology-aware trees automatically during execution. We present results demonstrating the advantages of our multilevel approach by comparing it to the default (topology-unaware) implementation provided by MPICH and a topology-aware two-layer implementation.