Here is the IASDS workshop presentation from September 4, 2009.
Phil Carns will be presenting a paper entitled 24/7 Characterization of Petascale I/O Workloads at the inaugural Workshop on Interfaces and Architectures for Scientific Data Storage. The event will be held on Friday, September 4th in New Orleans in conjunction with Cluster 2009.
This is the home page for Darshan, a scalable HPC I/O characterization tool. Darshan is designed to capture an accurate picture of application I/O behavior, including properties such as patterns of access within files, with minimum overhead. The name is taken from a Sanskrit word for “sight” or “vision”.
Darshan can be used to investigate and tune the I/O behavior of complex HPC applications. In addition, Darshan’s lightweight design makes it suitable for full time deployment for workload characterization of large systems. We hope that such studies will help the storage research community to better serve the needs of scientific computing.
Darshan was originally developed on the IBM Blue Gene series of computers deployed at the Argonne Leadership Computing Facility, but it is portable across a wide variety of platforms include the Cray XE6, Cray XC30, and Linux clusters. Darshan routinely instruments jobs using up to 786,432 compute cores on the Mira system at ALCF.
You will find current news about the Darshan project posted below. Additional documentation and details about the Darshan are available from the links at the top of this page.
Last week I did some comparative runs of the “testpio” kernel to find out why pnetcdf I/O was slower than raw binary MPI-IO. In this scenario, 512 cores write a 51MB file ten times.
There were some minor differences: binary (MPI-IO) uses a blockindexed type, while pnetcdf uses subarray. Pnetcdf syncs the file a few more times – pnetcdf calls MPI_FILE_SYNC when exiting define mode, but I think we will change that soon.