Argonne Technology Aids Largest Metagenome Study to Date

May 12, 2008

Researchers at Argonne National Laboratory recently completed a study that compared more than 14 million microbial sequences from 90 metagenomes collected from nine different biomes or ecosystems. Unlike other metagenomic studies, this project analyzed samples that run the gamut from deep-sea samples to the bacterial communities inside the human gut.

According to project leader Rob Edwards, a computational biologist at Argonne and San Diego State University, a metagenomic project on this scale would have been cost-prohibitive with Sanger sequencing technology, and also not feasible with the ultra-short-read next-generation platforms because the team needed at least 100-base reads to find significant hits in their databases. Edwards and his colleagues opted for Roche's 454 pyrosequencing to complete all of the sequencing runs for one of the largest sets of sequences ever analyzed.

Analysis was performed with RAST (Rapid Annotation using Subsystem Technology), a high-throughput pipeline designed specifically for metagenome sequencing by researchers at Argonne. In order to make sense of the millions of sequences collected in the study, he and his colleagues at the Fellowship for Interpretation of Genomes developed a directory called SEED containing all known protein and DNA sequences. Using this database, the researchers then isolated matches between the metagenomes and the sequences in SEED. Up until the launch of SEED and RAST, there was no one-stop resource where users could upload their data and obtain analyses in several different formats.

Originally, the investigators had assumed that the study would reveal similarities in metabolic function between the viral and microbial metagenomes across the different biomes. "One of the big things that was present in our paper that has never been done before is a functional analysis, so we looked for what metabolic genes are present in different environments," Edwards says. "One of the key points of our paper is that we can actually discriminate different environments based on what the bacteria are doing." Eventually the researchers found that more than 1 million sequences from the microbial metagenomes, and over 500,000 from the viral metagenomes, were substantially similar to functional genes within the SEED database.

Despite the large numbers of samples used for this study, Edwards says that the sequencing wasn't the hard part. "Extracting the DNA and generation of the sequences is reasonably trivial, because there are standard protocols for doing that," he says. "The big challenge is analyzing the data and understanding what the analysis means."