Argonne National Laboratory

A Case Study in Using Discrete-Event Simulation to Improve the Scalability of MG-RAST

TitleA Case Study in Using Discrete-Event Simulation to Improve the Scalability of MG-RAST
Publication TypeConference Paper
Year of Publication2016
AuthorsRoss, C, Mubarak, M, Jenkins, J, Carns, PH, Carothers, CD, Ross, RB, Tang, W, Gerlach, W, Meyer, F
Conference NameSIGSIM-PADS '16
Date Published05/2016
Conference LocationBanff, Canada
Other NumbersANL/MCS-P5572-0316
AbstractAs the cost of DNA sequencing has decreased, computational biology data processing platforms are experiencing an increasingly large volume of data analysis requests. The metagenomics analysis server MG-RAST at Argonne National Laboratory, a computational biology data processing platform, is receiving several terabytes of data submissions per month. However, MG-RAST currently relies on a central object-based data store, Shock, for data access and storage that can become a bottleneck under high data transfer loads, adversely affecting the job response time for end users. In this work, we use a discrete-event simulation approach to explore the use of data proxies and an enhanced, proxy-aware scheduling methodology designed to reduce the movement of the intermediate data generated during workflow processing. In this approach, Shock is supplemented with proxy storage servers, employing solid state drives, to decentralize the management and hence reduce the movement of intermediate workflow results. Discrete-event simulation provides a way to evaluate the performance of MG-RAST with increased workloads without disrupting the production system. For our case study, we extrapolate scientific workflows obtained from MG-RAST to represent future usage trends. We demonstrate that the addition of proxies and the proxy-aware scheduling methodology significantly reduces the data movement overhead by distributing the data plane, leading to substantial improvement in end-user job response time.