Argonne National Laboratory

Analysis and Correlation of Application I/O Performance and System-Wide I/O Activity

TitleAnalysis and Correlation of Application I/O Performance and System-Wide I/O Activity
Publication TypeReport
Year of Publication2017
AuthorsMadireddy, S, Balaprakash, P, Carns, PH, Latham, R, Ross, R, Snyder, S, Wild, SM
Report NumberANL/MCS-P7036-0417

Storage resources in high-performance computing are shared across all user applications. Consequently, storage performance can vary markedly, depending not only on an application’s workload but also on what other activity is con- currently running across the system. This variability in storage performance is directly reflected in overall execution time vari- ability, thus confounding efforts to predict job performance for scheduling or capacity planning. I/O variability also complicates the seemingly straightforward process of performance measurement when evaluating application optimizations. In this work we present a methodology to measure I/O contention with more rigor than in prior work. We apply statistical techniques to gain insight from application-level statistics and storage-side logging. We examine different correlation metrics for relating system workload to job I/O performance and identify an effective and generally applicable metric for measuring job I/O performance. We further demonstrate that the system-wide monitoring granularity can directly affect the strength of correlation observed. Insufficient granularity and measurements can hide the correlations between application I/O performance and system-wide I/O activity.