E. M. Glass and F. Meyer, "Analysis of Metagenomics Data," Bioinformatics for High Throughput Sequencing, Springer, 2011, . Also Preprint ANL/MCS-P1856-0311, March 2011. [pdf]
Improved sampling of diverse environments and advances in the development and application of next-generation sequencing technologies is accelerating the rate at which new metagenomes are produced. Over the past few years, the major challenge associated with metagenomics has shifted from generating to analyzing sequences. Metagenomic analysis includes the identification, and functional and evolutionary analysis of the genomic sequences of a community of organisms. There are many challenges involved in the analysis of these data sets including sparse metadata, a high volume of sequence data, genomic heterogeneity and incomplete sequences. Due to the nature of metagenomic data, analysis is very complex and requires new approaches and significant compute resources. Recently, several computational systems and tools have been developed and applied to analyze their functional and phylogenetic composition. The metagenomics RAST server (MG-RAST) is a high-throughput system that has been built to provide high-performance computing to researchers interested in analyzing metagenomic data. Automated functional assignments of sequences in the metagenome are generated by comparing both protein and nucleotide databases. Phylogenetic and functional summaries of the metagenomes are constructed, and statistical tools for comparative metagenomics are provided. MG-RAST provides a collaborative environment that allows for user privacy and management. In MG-RAST, all users retain full control of their data, and everything is available for download in a variety of formats. This service has removed one of the primary bottlenecks in metagenome sequence analysis, the availability of high-performance computing for annotating data.