|Title||Energy-Performance Tradeoffs in Multilevel Checkpoint Strategies |
|Publication Type||Report |
|Year of Publication||2014 |
|Authors||Balaprakash, P, Gomez, LABautist, Bouguerra, MS, Wild, SM, Cappello, F, Hovland, PD |
|Other Numbers||ANL/MCS-P5138-0514 |
|Abstract||Increased complexity of computer architectures, consideration of power constraints, and expected failure rates of hardware components make the design and analysis of energy-efficient fault-tolerance schemes an increasingly challenging and important task. We develop run-time and energy models for multilevel checkpoint schemes and characterize when tradeoffs between expected runtime and energy usage exist. Using these models, we study FTI, a recently developed multilevel checkpoint library, on an IBM Blue Gene/Q. We show that FTI has a low energy footprint and that, consequently optimal checkpoint-interval values with respect to time and energy are similar. We also explore the effect of general system-level parameters on run-time and energy tradeoffs.