Characterizing and Modeling Cloud Applications/Jobs on a Google Data Center

TitleCharacterizing and Modeling Cloud Applications/Jobs on a Google Data Center
Publication TypeJournal Article
Year of Publication2014
AuthorsDi, S, Kondo, D, Cappello, F
JournalJournal of Supercomputing
Date Published01/2014
Other NumbersANL/MCS-P5074-0214
AbstractIn this paper, we characterize and model Google applications and jobs, based on a one-month Google trace from a large-scale Google data center. We address four contributions: (1) we compute the valuable statistics about task events and resource utilization for Google applications, based on various types of resources and execution types; (2) we analyze the classification of applications via a K-means clustering algorithm with optimized number of sets, based on task events and resource usage; (3) we study the correlation of Google application properties and running features (e.g., job priority and scheduling class); (4) we finally build a model that can simulate Google jobs/tasks and dynamic events, in accordance with Google trace. Experiments show that the tasks simulated based on our model exhibit fairly analogous features with those in Google trace. 95+% of task simulation errors are less than 20%, confirming a high accuracy of our simulation model.