R. Overbeek, R. Aziz, D. Bartels, T. Disz, R. Edwards, S. Gerdes, C. Henry, G. Olsen, R. Olson, A. Osterman, T. Paczian, B. Parrello, G. D. Pusch, A. Rodriguez, R. Stevens, O. Vassieva, V. Vonstein, A. Wilke, and O. Zagnitko, "Programmatic Access to the SEED Data Via the Network," Preprint ANL/MCS-P1757-0610, June 2010. [pdf]
The SEED project is a cooperative gene annotation effort initiated in 2003. Researchers from a number of academic and private institutions built the SEED, an integration of genomic data that now contains almost a thousand complete or nearly complete genomes, a constantly updated set of curated annotations embodied in a large and growing collection of encoded subsystems, and a derived set of protein families. All of the SEED code and data are made freely available. Until recently, however, maintaining current copies of the SEED code and data at remote locations has been a pressing issue. This paper describes four network-based servers that address this issue. Specifically, the servers are intended to expose the data in the underlying relational database, support basic annotation services, offer programmatic access to the capabilities of the RAST annotation server, and provide access to a growing collection of metabolic models that support flux balance analysis. Moreover, the four servers offer access to regularly updated data, the ability to annotate prokaryotic genomes, the ability to create metabolic reconstructions and detailed models of metabolism, and access to hundreds of existing metabolic models. Our goal is to support a framework upon which other groups can build independent research efforts. Large integrations of genomic data represent one of the major intellectual resources driving research in biology, and we believe that programmatic access to the SEED data will provide significant utility to a broad collection of potential users.