Using Massively Parallel Simulation for MPI Collective Communication Modeling in Extreme-Scale Networks
|Title||Using Massively Parallel Simulation for MPI Collective Communication Modeling in Extreme-Scale Networks|
|Year of Publication||2014|
|Authors||Mubarak, M, Carothers, CD, Ross, RB, Carns, PH|
MPI collective operations are a critical and frequently used part of most MPI-based large-scale scientific applications. In previous work, we have enabled Rensselaer Optimistic Simulation System (ROSS) to predict the performance of MPI point-to-point messaging on high-fidelity million-node network simulations of torus and dragonfly interconnects. The main contribution of this work is an extension of these torus and dragonfly network models to support MPI collective communication operations using the optimistic event scheduling capability of ROSS. We demonstrate that both small-and large-scale ROSS collective communication models can execute efficiency on massively parallel architectures. We validate the results of our collective communication model against the measurements from IBM Blue Gene/Q and Cray XC30 platforms using a data-driven approach on our network simulations. We also perform experiments to explore the impact of tree degree on the performance of collective communication operations in large-scale network models.