MPI+Threads Applications at Scale: A Case Study with Parallel Breadth-First Search
|Title||MPI+Threads Applications at Scale: A Case Study with Parallel Breadth-First Search|
|Year of Publication||2014|
|Authors||Amer, A, Lu, H, Balaji, P, Matsuoka, S|
With the increasing prominence of manycore architectures and decreasing per-core memory available on large supercomputers, a number of applications are investigating the usage of hybrid MPI+threads programming to utilize computational units while sharing memory. A process-only model that uses one MPI process per system core is capable of effectively utilizing the available processing units, but fails to fully utilize the memory hierarchy. Hybrid MPI+threads model, on the other hand, can handle intranode parallelism more effectively, but can suffer from locking and memory consistency overheads associated with data sharing. Moreover, hybrid MPI+threads models can alleviate some of the overheads associated with inter-node data communication by allowing more coarse-grained data movement between address spaces, while still performing fine-grained accesses to data by different threads within the same address space. These intricacies are often not visible at small scales, but become highly prominent on large-scale systems causing performance bottlenecks and scalability limitations.