Autotuning and Specialization: Speeding up Matrix Multiply for Small Matrices with Compiler Technology

TitleAutotuning and Specialization: Speeding up Matrix Multiply for Small Matrices with Compiler Technology
Publication TypeConference Paper
Year of Publication2009
AuthorsShin, J, Hall, MW, Chame, J, Chen, C, Hovland, PD
Conference NameIn The Fourth International Workshop on Automatic Performance Tuning
Date Published07/2009
Other NumbersANL/MCS-P1664-0809
Abstract

Autotuning technology has emerged recently as a systematic process for evaluating alternative implementations of a computation to select the bestperforming solution for a particular architecture. Specialization optimizes code customized to a particular class of input data. This paper presents a compiler optimization approach that combines novel autotuning compiler technology with specialization for expected data set sizes of key computations, focused on matrix
multiplication of small matrices. We describe compiler techniques developed for this approach, including the interface to a polyhedral transformation system for generating specialized code and the heuristics used to prune the enormous search space of alternative implementations. We demonstrate significantly better performance
than directly using libraries such as GOTO, ATLAS and ACML BLAS that are not specifically optimized for the problem sizes on hand. In a case study of nek5000, a spectral element based code that extensively uses the specialized matrix multiply, we demonstrate a performance improvement for the full application of 36%.

URLhttp://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.155.3869
PDFhttp://www.mcs.anl.gov/papers/P1664.pdf