Argonne National Laboratory Mathematics and Computer Science Division
Argonne Home > MCS Division >

Publications

N. Desai, E. Lusk, D. Buettner, A. Cherry, and T. Voran, "Simulating Failures on Large-Scale Systems," 37th International Conference on Parallel Processing - Workshops, IEEE, 1969, pp. 103-108, . [pdf]

Developing fault management mechanisms is a difficult task because of the unpredictable nature of failures. In this paper, we present a fault simulation framework for Blue Gene/P systems implemented as a part of the Cobalt resource manager. The primary goal of this framework is to support system software development. We also present a hardware diagnostic system that we have implemented using this framework.


The Office of Advanced Scientific Computing Research | UChicago Argonne LLC | Privacy & Security Notice | ContactUs