Mathematics and Computer Science Division

Argonne National Laboratory

9700 S. Cass Ave., MCS-240 Argonne, IL 60439 tel: 630.252.1074 email: azamat.at.anl.gov

- A. Mametjanov, B. Norris, X. Zeng, B. Drewniak, J. Utke, M. Anitescu, P. Hovland. "Applying Automatic Differentiation to the Community Land Model." In Proceedings of the 6th International Conference on Automatic Differentiation (AD'12). Preprint ANL/MCS-P1993-0112, January 2012. [ pdf | slides ]
- A. Mametjanov, D. Lowell, C.-C. Ma, B. Norris. "Autotuning Stencil-Based Computations on GPUs." In Proceedings of the IEEE International Conference on Cluster Computing (Cluster'12). Preprint ANL/MCS-P2094-0512, May 2012. [ pdf | slides ]
- C. Choudary, J. Godwin, J. Holewinski, D. Karthik, D. Lowell, A. Mametjanov, B. Norris, G. Sabin, P. Sadayappan. "Stencial-aware GPU Optimization of Iterative Solvers." In Proceedings of the 12th Copper Mountain Conference on Iterative Methods (Copper'12). Preprint ANL/MCS-P3008-0712, July 2012. [ pdf ] Revised January 2013. [ pdf ]
- M. Min, J. Fu, A. Mametjanov. "Hybrid Programming and Performance for Beam Propagation Modeling." In Proceedings of the 11th International Computational Accelerator Physics Conference (ICAP'12). Preprint ANL/MCS-P3033-0812, August 2012. [ pdf ]

- Automatic Differentiation of the Community Land Model. 4th Annual Postdoctoral Research Symposium. Argonne National Laboratory. October, 2011. [ pdf ]
- Autotuning for GPUs using Orio. CScADS'12 workshop: Libraries and Autotuning for Extreme-scale Applications. Snowbird, Utah. August 13-14, 2012. [ slides ]
- Performance autotuning with Orio. 5th Annual Postdoctoral Research Symposium. Argonne National Laboratory. September, 2012. [ pdf ]
- Domain-Specific Languages for Stencil Computations. CACHE'12 annual meeting: Communication Avoidance and Communication Hiding at the Extreme Scale. Berkeley, CA. December 6-7, 2012. [ slides ]

BibTeX bibliography

- What can an autotuner do? How about a 2x performance speedup:

These are the performances of sparse matrix-vector product on a block-diagonal banded matrix with the width of the blocks plotted on the x-axis. State-of-the art CUDA sparse linear algebra library CUSP does not perform well on block-diagonals. Hand-tuned block-diagonal kernel (SG-DIA) does not account for the optimal CUDA performance parameters that are obtained by the autotuner (Orio).