Argonne National Laboratory

Toward Reliable Validation of HPC Network Simulation Models

TitleToward Reliable Validation of HPC Network Simulation Models
Publication TypeReport
Year of Publication2017
AuthorsMubarak, M, Jain, N, Domke, J, Wolfe, N, Ross, C, Li, K, Bhatele, A, Carothers, CD, Ma, K-L, Ross, RB
Report NumberANL/MCS-P7069-0717
AbstractWhile the high performance computing (HPC) community is relying on simulations increasingly to co-design and optimize HPC interconnects, the simulation community lacks a coherent set of practices to be followed when validating the simulators and network models. Validation of HPC network simulation models is a multi-step process starting with the selection of representative communication patterns, configuring the network model, followed by designing the set of experiments, and finally, documenting the outcome for reproducibility. In this paper, we present a set of recommended practices for each of these steps in the validation process. If the recommendations are followed, the end result should be a validated network model that can make reasonably accurate predictions and convince the community about the correctness of the model.