Seminars & Events
Mathematics and Computer Science Division
"Black-Box Localization of Storage Problems in Parallel File Systems"
DATE: December 15, 2011 to December 15, 2011
TIME: 10:00 AM - 11:00 AM
SPEAKER: Mike Kasick, PhD Student, Electical & Computer Engineering, Carnegie Mellon University
LOCATION: Building 240, Conference Room, 4301, Argonne National Laboratory
HOST: Rob Ross
Description:
This talk focuses on localizing storage-stack problems in parallel file systems by identifying, gathering and analyzing OS-level, black-box performance metrics on every server node in the cluster. Our peer-comparison diagnosis approach compares the statistical attributes of these metrics across file servers, to identify faulty disk arrays, storage controllers, and server nodes. We validate our approach by triggering real storage problems in a GPFS cluster running three different file-system benchmarks (dd, IOzone, and PostMark). We further demonstrate localization of storage problems through a preliminary analysis of our approach on the Intrepid storage cluster at Argonne National Laboratory.
Mike Kasick is a fifth-year PhD student in Electical & Computer Engineering at Carnegie Mellon University. His research focus is on methods for minimally-invasive, "production-ready" problem diagnsosis of parallel file systems and storage clusters.
Save the event to your calendar [schedule.ics]
