Argonne National Laboratory

Scalable Distributed Consensus to Support MPI Fault Tolerances

TitleScalable Distributed Consensus to Support MPI Fault Tolerances
Publication TypeReport
Year of Publication2011
AuthorsBuntinas, D
Date Published06/2011
Other NumbersANL/MCS-TM-314

As system sizes increase, the amount of time in which an application can run without experiencing a failure decreases. Exascale applications will need to address fault tolerance. In order to support algorithm-based fault tolerance, communication libraries will need to provide fault-tolerance features to the application. One important fault-tolerance operation is distributed consensus. This is used, for example, to collectively decide on a set of failed processes. This paper describes a scalable, distributed consensus algorithm that is used to support new MPI fault-tolerance features proposed by the MPI 3 Forum\'s fault-tolerance working group. The algorithm was implemented and evaluated on a 4,096-core Blue Gene/P. The implementation was able to perform a full-scale distributed consensus in 305 [mu]s and scaled logarithmically.