D. Buntinas, "Scalable Distributed Consensus to Support MPI Fault Tolerance," Lecture Notes in Computer Science, vol. 6960, Springer, 2012, pp. 325-328. Also Preprint ANL/MCS-P2027-0212, February 2012. [pdf]
As system sizes increase, the amount of time in which an application can run without experiencing a failure decreases. Exascale applications will need to address fault tolerance. In order to support algorithm-based fault tolerance, communication libraries will need to provide fault-tolerance features to the application. One important fault-tolerance operation is distributed consensus. This is used, for example, to collectively decide on a set of failed processes. This paper describes a scalable, distributed consensus algorithm that is used to support new MPI fault-tolerance features proposed by the MPI 3 Forum's fault-tolerance working group. The algorithm was implemented and evaluated on a 4,096-core Blue Gene/P. The implementation was able to perform a full-scale distributed consensus in 305 us and scaled logarithmically.