D. A. Antonopoulos, E. M. Glass, and F. Meyer, "Analyzing Metagenomic Data: Inferring Microbial Community Function with MG-RAST," Preprint ANL/MCS-P1800-1010, October 2010. [pdf]
Application of massively parallel throughput DNA sequencing technologies to the generation of metagenomic datasets from environmental samples is presently transforming the field of microbiology. Whereas traditional (Sanger-based) DNA sequencing technology imparted a high economic cost on data generation, the development of "next-generation" technologies now make the large-scale generation of sequence data required for studying complex microbial communities feasible. Therefore, molecular-based approaches to inferring the structure of microbial communities based on the cataloging of PCR amplified small subunit ribosomal RNA (SSU rRNA) encoding genes can now be complemented with the inference of the function of these communities via shotgun sequencing strategies. However, significant hurdles in analyzing sequence data at this scale include: (1) efficient strategies for identifying the gene content (annotation), (2) providing web-based interfaces for comparing datasets from different samples, and (3) applying statistical methods to guide identification of relevant gene sets for further study. The MG-RAST (MetaGenome Rapid Annotation using Subsystems Technology) system is one solution that has found widespread use in the analysis of metagenome-derived datasets. In this chapter, the underlying structure of the publicly accessible MG-RAST resource and how it addresses the aforementioned hurdles will be discussed. Additionally, future challenges will be identified in relation to the expected increase of data output from DNA sequencing platforms.