Argonne National Laboratory

Reliable MPI-IO through Layout-Aware Replication

TitleReliable MPI-IO through Layout-Aware Replication
Publication TypeConference Paper
Year of Publication2011
AuthorsSon, SW, Lang, S, Latham, R, Ross, RB, Thakur, R
Conference NameProc. 7th IEEE Int'l Workshop on Storage Network Architecture and Parallel I/O (SNAPI 2011)
Date Published05/2011
Conference LocationDenver, CO
Other NumbersANL/MCS-P1874-0411

<p>The current deployment of petascale systems and the promise of future exascale systems have created unprecedented challenges in how to manage failures in such systems. While many parallel file systems provide some sort of redundancy mechanism to cope with failures, such systems rely heavily on a hardware-based solution such as RAID. In this paper, we propose a block replication approach to store data redundantly. The approach does not depend on file system fault-tolerance mechanisms. Rather, the approach replicates each file block transparently within MPI-IO, using replication-aware datatypes. File striping information is used to place blocks from each replica in a separate storage node. We have implemented this replication mechanism in the MPI-IO layer. Our experimental results using a microbenchmark and real MPI-IO applications with PVFS and Lustre demonstrate that block replication in MPI-IO can be achieved transparently.</p>