Inspector-Executor Load Balancing Algorithms for Block-Sparse Tensor Contractions
|Title||Inspector-Executor Load Balancing Algorithms for Block-Sparse Tensor Contractions|
|Publication Type||Conference Paper|
|Year of Publication||2012|
|Authors||Ozog, D, Hammond, JR, Dinan, J, Balaji, P, Shende, S, Malony, AD|
|Conference Name||ICS '13|
Good load-balancing methods are required in order to obtain scalability from the NWChem coupled-cluster module, which allows the detailed study of chemical problems by iteratively solving the Schrodinger equation with an accurate ansatz. In this application, a relatively large amount of task information can be obtained at minimal cost, which suggests a static mapping of task groups to processors can be a simple and more efficient alternative to centralized dynamic load balancing. The distributed tensor contractions are block sparse, and an a priori inspection can quickly distinguish non-null tasks and assign them cost estimations based on characteristics such as their dimensions. Architecture-specific and empirically driven performance models of the dominant SORT and DGEMM routines serve as a cost estimator for a once-per-simulation static partitioning process. In this paper we demonstrate this inspector/executor technique, which improves the NWChem coupled cluster module\'s execution time by as much as 50% at scale. The technique is applicable to any scientific application requiring load balance where performance models or estimations of kernel execution times are available.