A Visual Network Analysis Method for Large Scale Parallel I/O Systems

TitleA Visual Network Analysis Method for Large Scale Parallel I/O Systems
Publication TypeConference Paper
Year of Publication2012
AuthorsSigovan, C, Muelder, C, Ma, K, Cope, J, Iskra, K, Ross, RB
Conference NameInternational Parallel and Distributed Processing Symposium (IPDPS 2013)
PublisherIEEE
Other NumbersANL/MCS-P3042-1012
Abstract

Parallel applications rely on I/O to load data, store end results, and protect partial results from being lost to system failure. Parallel I/O performance thus has a direct and significant impact on application performance. Because supercomputer I/O systems are large and complex, one cannot directly analyze their activity traces. While several visual or automated analysis tools for large-scale HPC log data exist, analysis research in the high-performance computing field is geared toward computation performance rather than I/O performance. Additionally, existing methods usually do not capture the network characteristics of HPC I/O systems. We present a visual analysis method for I/O trace data that takes into account the fact that HPC I/O systems can be represented as networks. We illustrate performance metrics in a way that facilitates the identification of abnormal behavior or performance problems. We demonstrate our approach on I/O traces collected from existing systems at different scales.

PDFhttp://www.mcs.anl.gov/papers/3042-1012.pdf