LANS Publications

"Autotuning and Specialization: Speeding up Matrix Multiply for Small Matrices with Compiler Technology"

J. Shin, M. W. Hall, J. Chame, C. Chen, and P. D. Hovland

In The Fourth International Workshop on Automatic Performance Tuning, . Also Preprint ANL/MCS-P1664-0809

Preprint Version: [pdf]

Autotuning technology has emerged recently as a systematic process for evaluating alternative implementations of a computation to select the bestperforming solution for a particular architecture. Specialization optimizes code customized to a particular class of input data. This paper presents a compiler optimization approach that combines novel autotuning compiler technology with specialization for expected data set sizes of key computations, focused on matrix
multiplication of small matrices. We describe compiler techniques developed for this approach, including the interface to a polyhedral transformation system for generating specialized code and the heuristics used to prune the enormous search space of alternative implementations. We demonstrate significantly better performance
than directly using libraries such as GOTO, ATLAS and ACML BLAS that are not specifically optimized for the problem sizes on hand. In a case study of nek5000, a spectral element based code that extensively uses the specialized matrix multiply, we demonstrate a performance improvement for the full application of 36%.