Title | Workload Characterization for MG-RAST Metagenomic Data Analytics Service in the Cloud |
Publication Type | Conference Proceedings |
Year of Publication | 2014 |
Authors | Tang, W, Bischof, J, Desai, NL, Mahadik, K, Gerlach, W, Harrison, T, Wilke, A, Meyer, F |
Conference Name | Big Data (Big Data), 2014 IEEE International Conference |
Pagination | 56 - 63 |
Publisher | IEEE Xplore |
Conference Location | 10.1109/BigData.2014.7004394 |
Other Numbers | ANL/MCS-P5202-0914 |
Abstract | The cost of DNA sequencing has plummeted in recent years. The consequent data deluge has imposed big bur- dens for data analysis applications. For example, MG-RAST, a production open-public metagenome annotation service, has experienced increasingly large amount of data submission and has demanded scalable resources for the computational needs. To address this problem, we have developed a scalable platform to port MG-RAST workloads into the cloud, where elastic computing resources can be used on demand. To efficiently utilize such resources, however, one must understand the characteristics of the application workloads. In this paper, we characterize the MG-RAST workloads running in the cloud, from the perspectives of computation, I/O, and data transfer. Insights from this work will help guide application enhancement, service operation, and resource management for MG-RAST and similar big data applications demanding elastic computing resources.
|
PDF | http://www.mcs.anl.gov/papers/P5202-0914.pdf |