A Case Study for Scientific I/O: Improving the FLASH Astrophysics Code

TitleA Case Study for Scientific I/O: Improving the FLASH Astrophysics Code
Publication TypeJournal Article
Year of Publication2011
AuthorsLatham, R, Daley, C, Liao, W-K, Gao, K, Ross, RB, Dubey, A, Choudhary, A
JournalA case study for scientific I/O: improving the FLASH astrophysics code
Volume5
Date Published01/2011
Other NumbersANL/MCS-P1819-0111
Abstract

The FLASH code is a computational science tool for simulating and studying thermonuclear reactions. The program periodically outputs large checkpoint files (to resume a calculation from a particular point in time) and smaller plot files (for visualization and analysis). Initial experiments on BlueGene/P spent excessive time in I/O, making it difficult to do actual science. Our investigation of time spent in I/O revealed several locations in the I/O software stack where we could make improvements. Fixing data corruption in the MPI-IO library allowed us to use collective I/O, yielding an order of magnitude improvement. Restructuring the data layout provided a more efficient I/O access pattern and yielded another doubling of performance, but broke format assumptions made by other tools in the application workflow. Using new nonblocking APIs in the Parallel-NetCDF library allowed us to keep high performance and maintain backward compatibility. While these optimizations required a detailed understanding of both the FLASH application and the I/O system software, this work demonstrates how collaboration between application and computer science groups can magnify each others efforts.

PDFhttp://www.mcs.anl.gov/papers/P1819.pdf