B. Norris, A. Hartono, E. Jessup, and J. Siek, "Generating Empirically Optimized Composed Matrix Kernels from MATLAB Prototypes," Proceedings of the 9th International Conference on Computational Science: Part I, Baton Rouge, LA, USA, Springer-Verlag, 2009, pp. 248-258. Also Preprint ANL/MCS-P1581-0209, February 2009. [pdf]
The development of optimized codes is time-consuming and requires extensive architecture, compiler, and language expertise, therefore, computational scientists are often forced to choose between investing considerable time in tuning code or accepting lower performance. In this paper, we describe the first steps toward a fully automated system for the optimization of the matrix algebra kernels that are a foundational part of many scientific applications. To generate highly optimized code from a high-level MATLAB prototype, we define a three-step approach. To begin, we have developed a compiler that converts a MATLAB script into simple C code. We then use the polyhedral optimization system PLuTo to optimize that code for coarse-grained parallelism and locality simultaneously. Finally, we annotate the resulting code with performance-tuning directives and use the empirical performance-tuning system Orio to generate many tuned versions of the same operation using different optimization techniques, such as loop unrolling and memory alignment. Orio performs an automated empirical search to select the best among the multiple optimized code variants. We discuss performance results on two architectures.