Enabling Concurrent Multithreaded MPI Communication on Multicore Petascale Systems
|Title||Enabling Concurrent Multithreaded MPI Communication on Multicore Petascale Systems|
|Publication Type||Conference Paper|
|Year of Publication||2010|
|Authors||Dozsa, G, Kumar, S, Balaji, P, Buntinas, D, Goodell, D, Gropp, WD, Ratterman, J, Thakur, R|
|Conference Name||17th EuroMPI conference|
|Conference Location||Stuttgart, Germany|
With the ever-increasing numbers of cores per node on HPC systems, applications are increasingly using threads to exploit the shared memory within a node, combined with MPI across nodes. Achieving high performance when a large number of concurrent threads make MPI calls is a challenging task for an MPI implementation. We describe the design and implementation of our solution in MPICH2 to achieve high performance multithreaded communication on the IBM Blue Gene/P. We use a combination of a multichannel-enabled network interface, fine grained locks, lock-free atomic operations, and specially designed queues to provide a high degree of concurrent access while still maintaining MPI\'s message-ordering semantics. We present performance results that demonstrate that our new design improves the multithreaded message rate by a factor of 3.6 compared with the existing implementation on the BG/P. Our solutions are also applicable to other high-end systems that have parallel network access capabilities.