Argonne National Laboratory

Workload Characterization for MG-RAST Metagenomic Data Analytics Service in the Cloud

TitleWorkload Characterization for MG-RAST Metagenomic Data Analytics Service in the Cloud
Publication TypeConference Proceedings
Year of Publication2014
AuthorsTang, W, Bischof, J, Desai, NL, Mahadik, K, Gerlach, W, Harrison, T, Wilke, A, Meyer, F
Conference NameBig Data (Big Data), 2014 IEEE International Conference
Pagination56 - 63
PublisherIEEE Xplore
Conference Location10.1109/BigData.2014.7004394
Other NumbersANL/MCS-P5202-0914
AbstractThe cost of DNA sequencing has plummeted in recent years. The consequent data deluge has imposed big bur- dens for data analysis applications. For example, MG-RAST, a production open-public metagenome annotation service, has experienced increasingly large amount of data submission and has demanded scalable resources for the computational needs. To address this problem, we have developed a scalable platform to port MG-RAST workloads into the cloud, where elastic computing resources can be used on demand. To efficiently utilize such resources, however, one must understand the characteristics of the application workloads. In this paper, we characterize the MG-RAST workloads running in the cloud, from the perspectives of computation, I/O, and data transfer. Insights from this work will help guide application enhancement, service operation, and resource management for MG-RAST and similar big data applications demanding elastic computing resources.  
PDFhttp://www.mcs.anl.gov/papers/P5202-0914.pdf