Fault-Aware Utility-Based Job Scheduling on BlueGene/P Systems
|Title||Fault-Aware Utility-Based Job Scheduling on BlueGene/P Systems|
|Publication Type||Conference Paper|
|Year of Publication||2009|
|Authors||Buettner, D, Desai, NL, Lan, Z, Tang, W|
|Conference Name||2009 IEEE Conference on Cluster Computing (Cluster 2009)|
|Conference Location||New Orleans, LA|
Job scheduling on large-scale systems is increasingly a complicated affair, with numerous factors influencing scheduling policy. Addressing these concerns results in sophisticated scheduling policies that can be difficult to reason about. In this paper, we present a general utility-based scheduling framework to balance different scheduling requirements/priorities. It enables system owners to customize scheduling policies under different circumstances without changing the scheduling code. We also develop a fault-aware job allocation strategy for Blue Gene/P systems to address the increasing concern of system failures. We demonstrate the effectiveness of these facilities by means of event-driven simulations with real job traces collected from the production Blue Gene/P system at ANL.