Energy-Performance Tradeoffs in Multilevel Checkpoint Strategies

TitleEnergy-Performance Tradeoffs in Multilevel Checkpoint Strategies
Publication TypeReport
Year of Publication2014
AuthorsBalaprakash, P, Gomez, LABautist, Bouguerra, MS, Wild, SM, Cappello, F, Hovland, PD
Other NumbersANL/MCS-P5138-0514
AbstractIncreased complexity of computer architectures, consideration of power constraints, and expected failure rates of hardware components make the design and analysis of energy-efficient fault-tolerance schemes an increasingly challenging and important task. We develop run-time and energy models for multilevel checkpoint schemes and characterize when tradeoffs between expected runtime and energy usage exist. Using these models, we study FTI, a recently developed multilevel checkpoint library, on an IBM Blue Gene/Q. We show that FTI has a low energy footprint and that, consequently optimal checkpoint-interval values with respect to time and energy are similar. We also explore the effect of general system-level parameters on run-time and energy tradeoffs.