Argonne National Laboratory Mathematics and Computer Science Division
Argonne Home > MCS Division > Seminar & Events

Seminars & Events

Bookmark and Share

Mathematics and Computer Science Division
"Containment Domains for Scalable and Efficient Resilience"

DATE: April 24, 2012
TIME: 10:30 AM - 11:30 AM
SPEAKER: Mattan Erez, Assistant Professor in the Department of Electrical and Computer Engineering at the University of Te
LOCATION: Building 240, Seminar Room 4301, Argonne National Laboratory
HOST: Andrew Chien

Description:
In this talk I will present a scalable and efficient resiliency scheme based on the concept of Containment Domains. Containment domains are programming and system constructs that encapsulate and express application resiliency needs and interact with the system to tune and specialize error detection, state preservation and restoration, and recovery schemes. Containment domains have weak transactional semantics and are nested to take advantage of the machine hierarchy and to enable distributed and hierarchical state preservation, restoration, and recovery as an alternative to non-scalable and inefficient checkpoint-restart (and variants). One of the key motivations behind this work is the idea of proportionality, where the resources devoted to a feature are proportional to the application and scenario needs. Proportionality is critical to continued scaling and performance under the increasing constraints of bandwidth, power, and energy. Essentially, one-size-fits-all and worst-case design approaches are no longer sufficient to building reliable and efficient systems. Time permitting, I will describe additional projects in my group that enable proportional resilience and bandwidth usage in the memory system.


Save the event to your calendar [schedule.ics]


The Office of Advanced Scientific Computing Research | UChicago Argonne LLC | Privacy & Security Notice | ContactUs