Argonne National Laboratory

Formal Methods Applied to HPC Software Design: A Case Study of Locking Based on MPI One-Sided Communication

TitleFormal Methods Applied to HPC Software Design: A Case Study of Locking Based on MPI One-Sided Communication
Publication TypeJournal
Year of Publication2007
AuthorsPervez, S, Gopalakrishnan, G, Kirby, G, Thakur, R, Gropp, WD
Other NumbersANL/MCS-P1433-0607
AbstractThere is growing need to address the complexity of modeling and verifying the numerous concurrent protocols involved in high-performance computing (HPC) software design. Finite-state modeling and formal verification (FV) using model-checking technology have made impressive inroads into debugging concurrent protocols. However, there remains a dearth of research in applying model-checking methods to HPC software design. This situation can be attributed to the lack of awareness by the HPC community about the potentials of FV in the HPC arena, as well as lack of awareness by the FV community of the challenges that HPC application domains pose. In this paper, we demonstrate the utility of finite-state modeling and model checking in detecting race conditions and deadlocks in concurrent protocols that arise while developing HPC software. In particular, we detail a case study that develops a distributed byte-range locking algorithm using MPI's one-sided communication. Our model-checking effort detected a race condition that can cause deadlock, of which the authors of the algorithm were unaware. We describe two designs to fix the deadlock problem, and we present their formal analysis using model checking, and their performance analysis on a 128-node cluster. Our objective is to give practitioners, especially those developing parallel programs using libraries, the opportunity to accurately assess the costs and benefits of finite-state modeling and model checking.
PDFhttp://www.mcs.anl.gov/papers/P1433.pdf