Argonne National Laboratory

Anomaly Detection and Diagnosis in Grid Environments

TitleAnomaly Detection and Diagnosis in Grid Environments
Publication TypeConference Paper
Year of Publication2007
AuthorsYang, L, Liu, C, Schopf, JM, Foster, IT
Conference NameInternational Conference for High Performance Computing, Networking, Storage, and Analysis (SC07)
Conference LocationReno, NV
Other NumbersANL/MCS-P1444-0707
AbstractIdentifying and diagnosing anomalies in application behavior is critical to delivering reliable application-level performance. In this paper we introduce a strategy to detect anomalies and diagnose the possible reasons behind them. Our approach extends the traditional window-based strategy by using signal-processing techniques to filter out recurring, background fluctuations in resource behavior. In addition, we have developed a diagnosis technique that uses standard monitoring data to determine where related changes in behavior occur at the times of the anomalies. We evaluate our anomaly detection and diagnosis technique by applying it in three contexts and inserting anomalies into the system at random intervals. The experimental results show that our strategy detects up to 96% of anomalies while reducing the fault positive rate by up to 90% compared to the traditional window average strategy. In addition, our strategy can diagnose the reason for the anomaly approximately 75% of the time.