A Visual Network Analysis Method for Large Scale Parallel I/O Systems

TitleA Visual Network Analysis Method for Large Scale Parallel I/O Systems
Publication TypeConference Paper
Year of Publication2012
AuthorsSigovan, C, Muelder, C, Ma, K, Cope, J, Iskra, K, Ross, RB
Conference NameInternational Parallel and Distributed Processing Symposium (IPDPS 2013)
Other NumbersANL/MCS-P3042-1012

Parallel applications rely on I/O to load data, store end results, and protect partial results from being lost to system failure. Parallel I/O performance thus has a direct and significant impact on application performance. Because supercomputer I/O systems are large and complex, one cannot directly analyze their activity traces. While several visual or automated analysis tools for large-scale HPC log data exist, analysis research in the high-performance computing field is geared toward computation performance rather than I/O performance. Additionally, existing methods usually do not capture the network characteristics of HPC I/O systems. We present a visual analysis method for I/O trace data that takes into account the fact that HPC I/O systems can be represented as networks. We illustrate performance metrics in a way that facilitates the identification of abnormal behavior or performance problems. We demonstrate our approach on I/O traces collected from existing systems at different scales.