Improving the Performance of MPI Collective Communication on Switched Networks

TitleImproving the Performance of MPI Collective Communication on Switched Networks
Publication TypeReport
Year of Publication2002
AuthorsThakur, R, Gropp, WD
Date Published11/2002
Other NumbersANL/MCS-P1007-1102

<p>In this paper, we present new algorithms for improving the performance of collective communication operations in MPI. Our target architecture isa cluster of machines connected by a switched network such as Myrinet or switched ethernet. We have developed new algorithms for all the MPI collective communication operations, namely, scatter/gather/reduce, allgather/allreduce, broadcast, reduce-scatters, all-to-all, and scan. We compare the performance of our new algorithms with the algorithms currently used in the latest version of MPICH on up to 256 nodes of a Myrinet-connected cluster. For operations such as scatter/gather/reduce, allgather/allreduce, and reduce-scatter, we observe an improvement of up to a factor of 10 for short messages sizes. For operations such as broadcast and reduce-scatter and for long messages sizes, the new algorithms are truly scalable: the time taken remains fairly constant as we increase the number of processes participating in the operation.</p>