Exploring the Optimization Space for Build to Order Matrix Algebra

TitleExploring the Optimization Space for Build to Order Matrix Algebra
Publication TypeConference Paper
Year of Publication2011
AuthorsBelter, G, Jessup, E, Karlin, I, Nelson, T, Norris, B, Siek, JG
Date Published05/2011
Other NumbersANL/MCS-P1890-0511

The Build to Order (BTO) system compiles a sequence of matrix and vector operations into a high-performance C program for a given architecture. We focus on optimizing programs where memory traffic is the bottleneck. Loop fusion and data parallelism play an important role in this context, but applying them at every opportunity does not necessarily lead to the best performance. We present an empirical and exhaustive characterization of the optimization space for these two optimizations reporting its size and how many points in the space are close to the fastest option. We show how optimizations of different parts of the program affect one another and how the best choices depend on the computer system. We also evaluate the suitability of several algorithms for searching the space. We leverage these findings to ensure that the BTO compiler produces kernels that outperform vendor-tuned BLAS on a variety of modern computer architectures.