Argonne National Laboratory Mathematics and Computer Science Division
Argonne Home > MCS Division > Seminar & Events

Seminars & Events

Bookmark and Share

Computing, Environment and Life Sciences/Mathematics and Computer Science Division
"Log Analysis for Reliability Management in Large-Scale Systems"

DATE: March 28, 2011 to March 28, 2011
TIME: 10:30 AM - 11:30 AM
SPEAKER: Ziming Zheng, PhD Candidate at Illinois Institute of Technology
LOCATION: Bldg. 240, Room 4301, Argonne National Laboratory
HOST: Mark Hereld

Description:
Abstract:

With the increasing scale and complexity of HPC systems, reliability is becoming critical for these systems. System logs are the primary source of information to understand and analyze system problems. Nevertheless, little study has been done on automated log analysis for HPC systems. In this talk, I will summarize our study on system logs collected from production HPC systems by exploiting data mining and statistical learning technologies.

Our work can be broadly divided into four parts: log pre-processing, online failure prediction, automatic root cause diagnosis, and reliability modeling. The work can greatly improve our understanding of faults/errors/failures arising from hardware/software components and their interactions in HPC systems, and can further facilitate the resilience research for large-scale systems.

More Information:
Bio: Ziming Zheng is a PhD candidate at Illinois Institute of Technology, under the supervision of Dr. Zhiling Lan. He received his BS and MS degrees from the College of Computer Science and Engineering, University of Electronic Science and Technology of China. His research interest is fault resilience for large-scale systems. He was an intern at Argonne National Laboratory and Oak Ridge National Laboratory for CIFTS project. More details about Ziming Zheng are available at http://www.iit.edu/~zzheng11/.
Save the event to your calendar [schedule.ics]


The Office of Advanced Scientific Computing Research | UChicago Argonne LLC | Privacy & Security Notice | ContactUs