Argonne National Laboratory

A Scalable Data Analysis Platform for Metagenomics

TitleA Scalable Data Analysis Platform for Metagenomics
Publication TypeConference Paper
Year of Publication2013
AuthorsTang, W, Wilkening, J, Desai, NL, Gerlach, W, Wilke, A, Meyer, F
Conference NameThe Proceedings of the 2013 IEEE International Conference on Big Data
Conference LocationSanta Clara, CA
Other NumbersANL/MCS-P5012-0913

With the advent of high-throughput DNA sequencing technology, the analysis and management of the increasing amount of biological sequence data has become a bottleneck for scientific progress. For example, MG-RAST, a metagenome annotation system serving a large scientific community worldwide, has experienced a sustained, exponential growth in data submissions for several years; and this trend is expected to continue. To address the computational challenges posed by this workload, we developed a new data analysis platform, including a data management system (Shock) for biological sequence data and a workflow management system (AWE) supporting scalable, fault-tolerant task and resource management. Shock and AWE can be used to build a scalable and reproducible data analysis infrastructure for upper-level biological data analysis services.