Inspector-Executor Load Balancing Algorithms for Block-Sparse Tensor Contractions

TitleInspector-Executor Load Balancing Algorithms for Block-Sparse Tensor Contractions
Publication TypeConference Paper
Year of Publication2012
AuthorsOzog, D, Hammond, JR, Dinan, J, Balaji, P, Shende, S, Malony, AD
Conference NameICS '13
Other NumbersANL/MCS-P3056-1112

Good load-balancing methods are required in order to obtain scalability from the NWChem coupled-cluster module, which allows the detailed study of chemical problems by iteratively solving the Schrodinger equation with an accurate ansatz. In this application, a relatively large amount of task information can be obtained at minimal cost, which suggests a static mapping of task groups to processors can be a simple and more efficient alternative to centralized dynamic load balancing. The distributed tensor contractions are block sparse, and an a priori inspection can quickly distinguish non-null tasks and assign them cost estimations based on characteristics such as their dimensions. Architecture-specific and empirically driven performance models of the dominant SORT and DGEMM routines serve as a cost estimator for a once-per-simulation static partitioning process. In this paper we demonstrate this inspector/executor technique, which improves the NWChem coupled cluster module\'s execution time by as much as 50% at scale. The technique is applicable to any scientific application requiring load balance where performance models or estimations of kernel execution times are available.