Argonne National Laboratory

Analysis and Correlation of Application I/O Performance and System-Wide I/O Activity

TitleAnalysis and Correlation of Application I/O Performance and System-Wide I/O Activity
Publication TypeConference Paper
Year of Publication2017
AuthorsMadireddy, S, Balaprakash, P, Carns, PH, Latham, R, Ross, R, Snyder, S, Wild, SM
Conference Name2017 International Conference on Networking, Architecture, and Storage (NAS)
Date Published08/2017
Conference LocationShenzhen, China
AbstractStorage resources in high-performance computing are shared across all user applications. Consequently, storage performance can vary markedly, depending not only on an application’s workload but also on what other activity is con- currently running across the system. This variability in storage performance is directly reflected in overall execution time vari- ability, thus confounding efforts to predict job performance for scheduling or capacity planning. I/O variability also complicates the seemingly straightforward process of performance measurement when evaluating application optimizations. In this work we present a methodology to measure I/O contention with more rigor than in prior work. We apply statistical techniques to gain insight from application-level statistics and storage-side logging. We examine different correlation metrics for relating system workload to job I/O performance and identify an effective and generally applicable metric for measuring job I/O performance. We further demonstrate that the system-wide monitoring granularity can directly affect the strength of correlation observed. Insufficient granularity and measurements can hide the correlations between application I/O performance and system-wide I/O activity.