Argonne National Laboratory Mathematics and Computer Science Division
Argonne Home > MCS Division >

Publications

A. Wilke, T. Harrison, J. Wilkening, D. Field, E. M. Glass, N. Kyrpides, K. Mavrommatis, F. Meyer, "The M5nr: A Novel Non-Redundant Database Containing Protein Sequences and Annotations from Multiple Sources and Associated Tools," Preprint ANL/MCS-P2044-0212, February 2012. [pdf]

Background: Computing of sequence similarity results is becoming a limiting factor in metagenome analysis. Sequence similarity search results encoded in an open, exchangeable format have the potential to limit the needs for computational reanalysis of these data sets. A prerequisite for sharing of similarity results is a common reference.

Results: We introduce a mechanism for automatically maintaining a comprehensive, non-redundant protein database and for creating a quarterly release of this resource. In addition, we present tools for translating similarity searches into many annotation names paces, e.g. KEGG or NCBI's GenBank.

Conclusions: The data and tools we present allow the creation of multiple result sets using a single computation, permitting computational results to be shared between groups for large sequence data sets.


The Office of Advanced Scientific Computing Research | UChicago Argonne LLC | Privacy & Security Notice | ContactUs