Design and Analysis of Data Management in Scalable Parallel Scripting
|Title||Design and Analysis of Data Management in Scalable Parallel Scripting|
|Publication Type||Conference Paper|
|Year of Publication||2012|
|Authors||Zhang, Z, Katz, DS, Wozniak, JM, Espinosa, A, Foster, IT|
|Conference Location||Salt Lake City, UT|
We seek to enable efficient large-scale parallel execution of applications in which a shared filesystem abstraction is used to couple many tasks. Such parallel scripting (Many-Task-Computing) applications suffer poor performance and utilization on large parallel computers due to the volume of filesystem I/O and a lack of appropriate optimizations in the shared filesystem. Thus, we design and implement a scalable MTC data management system that uses aggregated compute node local storage for more efficient data movement strategies. We co-design the data management system with the data-aware scheduler to enable dataflow pattern identification and automatic optimization. The framework reduces the time-to-solution of parallel stages of an astronomy data analysis application, Montage, by 83.2% on 512 cores, decreases time-to-solution of a seismology application, CyberShake, by 7.9% on 2,048 cores, and delivers BLAST performance better than mpiBLAST at various scales up to 32,768 cores, while preserving the flexibility of the original BLAST application.