Argonne National Laboratory

Hierarchical Collectives in MPICH2

TitleHierarchical Collectives in MPICH2
Publication TypeConference Paper
Year of Publication2009
AuthorsZhu, H, Goodell, D, Gropp, WD, Thakur, R
Conference NameRecent Advances in Parallel Virtual Machine and Message Passing Interface
Date Published08/2009
PublisherSpringer Berlin / Heidelberg
Conference LocationEspoo, Finland
Other NumbersANL/MCS-P1622-0509

Most parallel systems on which MPI is used are now hierarchical: some processors are much closer to others in terms of interconnect performance. One of the most common such examples are systems whose
nodes are symmetric multiprocessors (including �multicore� processors). A number of papers have developed algorithms and implementations that exploit shared memory on such nodes to provide optimized collective operations, and these show significant performance benefits compared to implementations that do not exploit the hierarchical structure of the nodes. However, shared memory between processes is often a scarce resource. How necessary is it to use shared memory for collectives in MPI? How much of the performance benefit comes from tailoring the algorithm to the hierarchical topology of the system? In this paper, we describe an implementation based entirely on message-passing primitives but that exploits knowledge of the two-level hierarchy. We discuss both rootless collectives (such as Allreduce) and rooted collectives (such as Reduce), and develop a performance model. Our results show that for most collectives,
exploiting shared memory directly will bring small additional
benefit, and the places where shared memory is beneficial suggest design
approaches that make best use of a pool of shared memory.