Design and Analysis of Data Management in Scalable Parallel Scripting

TitleDesign and Analysis of Data Management in Scalable Parallel Scripting
Publication TypeConference Paper
Year of Publication2012
AuthorsZhang, Z, Katz, DS, Wozniak, JM, Espinosa, A, Foster, IT
Conference NameSC'12
Date Published11/2012
Conference LocationSalt Lake City, UT
Other NumbersANL/MCS-P3012-0712

We seek to enable efficient large-scale parallel execution of applications in which a shared filesystem abstraction is used to couple many tasks. Such parallel scripting (Many-Task-Computing) applications suffer poor performance and utilization on large parallel computers due to the volume of filesystem I/O and a lack of appropriate optimizations in the shared filesystem. Thus, we design and implement a scalable MTC data management system that uses aggregated compute node local storage for more efficient data movement strategies. We co-design the data management system with the data-aware scheduler to enable dataflow pattern identification and automatic optimization. The framework reduces the time-to-solution of parallel stages of an astronomy data analysis application, Montage, by 83.2% on 512 cores, decreases time-to-solution of a seismology application, CyberShake, by 7.9% on 2,048 cores, and delivers BLAST performance better than mpiBLAST at various scales up to 32,768 cores, while preserving the flexibility of the original BLAST application.