Experiences Building Globus Genomics: A Next-Generation Sequencing Analysis Service using Galaxy, Globus, and Amazon Web Services

TitleExperiences Building Globus Genomics: A Next-Generation Sequencing Analysis Service using Galaxy, Globus, and Amazon Web Services
Publication TypeJournal Article
Year of Publication2014
AuthorsMadduri, RK, Sulakhe, D, Lacinski, L, Liu, B, Rodriguez, A, Chard, K, Dave, UJ, Foster, IT
JournalConcurrency and Computation: Practice & Experience
Other NumbersANL/MCS-P5107-0314
Abstract

We describe Globus Genomics, a system that we have developed for rapid analysis of large quantities of next-generation sequencing (NGS) genomic data. This system achieves a high degree of end-to-end automation that encompasses every stage of data analysis including initial data retrieval from remote sequencing centers or storage (via the Globus file transfer system); specification, configuration, and reuse of multi-step processing pipelines (via the Galaxy workflow system); creation of custom Amazon Machine Images and on-demand resource acquisition via a specialized elastic provisioner (on Amazon EC2); and efficient scheduling of these pipelines over many processors (via the HTCondor scheduler). The system allows biomedical researchers to perform rapid analysis of large NGS datasets in a fully automated manner, without software installation or a need for any local computing infrastructure. We report performance and cost results for some representative workloads.

 

PDFhttp://www.mcs.anl.gov/papers/P5107-0314.pdf