A Scalable Data Analysis Platform for Metagenomics
|Title||A Scalable Data Analysis Platform for Metagenomics|
|Publication Type||Conference Paper|
|Year of Publication||2013|
|Authors||Tang, W, Wilkening, J, Desai, NL, Gerlach, W, Wilke, A, Meyer, F|
|Conference Name||The Proceedings of the 2013 IEEE International Conference on Big Data|
|Conference Location||Santa Clara, CA|
With the advent of high-throughput DNA sequencing technology, the analysis and management of the increasing amount of biological sequence data has become a bottleneck for scientific progress. For example, MG-RAST, a metagenome annotation system serving a large scientific community worldwide, has experienced a sustained, exponential growth in data submissions for several years; and this trend is expected to continue. To address the computational challenges posed by this workload, we developed a new data analysis platform, including a data management system (Shock) for biological sequence data and a workflow management system (AWE) supporting scalable, fault-tolerant task and resource management. Shock and AWE can be used to build a scalable and reproducible data analysis infrastructure for upper-level biological data analysis services.