Seminar Details:

LANS Informal Seminar
"Building an Integrated Big Data Analysis Platform for Genomic Sciences and Addressing the Resource Management Challenges in the Cloud"

DATE: August 13, 2014

TIME: 15:00:00 - 16:00:00
SPEAKER: Wei Tang, Postdoctoral Appointee, IGSB, Argonne National Laboratory
LOCATION: Building 240 Room 1404-1405, Argonne National Laboratory

Next-Generation Sequencing (NGS) has cut the DNA sequencing cost dramatically and thus shifted the bottleneck of genomic sciences from data generation to data analysis that requires increasing computing capacities. Meanwhile, the consequent data deluge has imposed challenges for several human roles in computational genomics sciences, including bioinformatics tool developers, workflow builders, data analysis service operators, and computing resource administrators.To address these problems, we have developed an integrated platform, comprising Shock data management system and AWE workload management system, which supports reusable sequence data management and accelerated workflow executions on scalable, distributed computing resources. With Shock/AWE, we have ported the MG-RAST pipeline, a popular metagenome analysis service, into the cloud and achieved scalable throughputs. However, resource management challenges exist in the cloud, especially when data movement between multiple sites plays an important role.

In this talk, I will first talk about the data deluge problems in genomic sciences and our open-source data analysis platform supporting an integrated management for applications, services, data, and computing resources. Then, I will talk more about the resource management aspect, describing the observed problems and our efforts to address them, including 1) MG-RAST workload characterization to understand the application needs for the cloud, and 2) a data-aware distributed workflow scheduling mechanism, along with a workflow simulator on top of CODES/ROSS simulation framework, which can provide effective capacity planning and task allocation in multi-cloud environments.


Please send questions or suggestions to Jeffrey Larson: jmlarson at anl dot gov.