Improving the Performance of MPI Derived Datatypes by Optimizing Memory-Access Cost

TitleImproving the Performance of MPI Derived Datatypes by Optimizing Memory-Access Cost
Publication TypeReport
Year of Publication2003
AuthorsByna, S, Gropp, WD, Sun, X-H, Thakur, R
Date Published04/2003
Other NumbersANL/MCS-P1045-0403
Abstract

The MPI Standard supports derived datatypes, which allow users to describe noncontiguous memory layout and communicate noncontiguous data with a single communication function. This feature enables an MPI implementation to optimize the transfer of noncontiguous data. In practice, however, few MPI implementations implement derived datatypes in a way that performs better than what the user can achieve by manually packing data into a contiguous buffer and then calling an MPI function. In this paper, we present a technique for improving the performance of derived datatypes by automatically using packing algorithms that are optimized for memory-access cost. The packing algorithms are memory-optimization techniques that the user cannot apply easily without advanced knowledge of the memory architecture. We present performance results for a matrix-transpose example that demonstrate that our implementation of derived datatypes significantly outperforms both manual packing by the user and the existing derived-datatype code in the MPI implementation (MPICH).

PDFhttp://www.mcs.anl.gov/papers/P1045.pdf