Enabling Concurrent Multithreaded MPI Communication on Multicore Petascale Systems

TitleEnabling Concurrent Multithreaded MPI Communication on Multicore Petascale Systems
Publication TypeConference Paper
Year of Publication2010
AuthorsDozsa, G, Kumar, S, Balaji, P, Buntinas, D, Goodell, D, Gropp, WD, Ratterman, J, Thakur, R
Conference Name17th EuroMPI conference
Date Published09/2010
Conference LocationStuttgart, Germany
Other NumbersANL/MCS-P1761-0610

With the ever-increasing numbers of cores per node on HPC systems, applications are increasingly using threads to exploit the shared memory within a node, combined with MPI across nodes. Achieving high performance when a large number of concurrent threads make MPI calls is a challenging task for an MPI implementation. We describe the design and implementation of our solution in MPICH2 to achieve high performance multithreaded communication on the IBM Blue Gene/P. We use a combination of a multichannel-enabled network interface, fine grained locks, lock-free atomic operations, and specially designed queues to provide a high degree of concurrent access while still maintaining MPI\'s message-ordering semantics. We present performance results that demonstrate that our new design improves the multithreaded message rate by a factor of 3.6 compared with the existing implementation on the BG/P. Our solutions are also applicable to other high-end systems that have parallel network access capabilities.