Connecting Sequence Data to Virulence Factors in Streptococcus Genomes

Publication TypeReport
Year of Publication2014
AuthorsAlagarsamy, J, Seetharaman, R, Sivanandham, C, Krishnan, KChella, Overbeek, RA
Document NumberANL/MCS-TM-342

The SEED Project ( was started over a decade ago to focus on creating more accurate annotations of prokaryotic genomes, along with the tools needed to support such an effort. A number of research teams have participated, and numerous projects were based on the evolving technology that was cooperatively built. Some of these teams sought funding to support comparative analysis of pathogens, and a number did successfully acquire funding from NSF, NIH, and DOE.
While it is often asserted that we cannot afford to support manual annotation efforts, we counter with the following simple argument:
1. Accurate automated annotations will directly impact the value of the hundreds of thousands of genomes that will be sequenced during the next few years; and
2. The quality of automated annotations is directly related to the availability of accurately annotated reference genomes.
While it is true that most manual annotation efforts could have achieved far higher efficiency, it is also true that extraction of value from newly sequenced genomes will depend heavily on a carefully curated set of subsystems built upon the annotations of a set of reference genomes.