Analysis of the Tradeoffs between Energy and Run Time for Multilevel Checkpointing
|Title||Analysis of the Tradeoffs between Energy and Run Time for Multilevel Checkpointing|
|Publication Type||Conference Paper|
|Year of Publication||2014|
|Authors||Balaprakash, P, Gomez, LABautist, Bouguerra, MS, Wild, SM, Cappello, F, Hovland, PD|
|Conference Name||Proceedings of the 5th International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS14)|
In high-performance computing, there is a perpetual hunt for performance and scalability. Supercomputers grow larger offering improved computational science throughput. Nevertheless, with an increase in the number of systems’ components and their interactions, the number of failures and the power consumption will increase rapidly. Energy and reliability are among the most challenging issues that need to be addressed for extreme scale computing. We develop analytical models for run time and energy usage for multilevel fault-tolerance schemes. We use these models to study the tradeoff between run time and energy in FTI, a recently developed multilevel checkpoint library, on an IBM Blue Gene/Q. Our results show that energy consumed by FTI is low and the tradeoff between the run time and energy is small. Using the analytical models, we explore the impact of various system-level parameters on run time and energy tradeoffs.