I am a Senior Software Developer in the Mathematics and Computer Science Division at Argonne National Laboratory.


Research Interests:

My current research interests lie in the area of peta-scale computing, investigating fault tolerance in such high-end computing machines and working with middleware libraries and programming models


Research:

I am currently working on the CIFTS's project. The CIFTS project aims at improving fault tolerance in current and emerging high-end computing machines. While most high-end systems do provide mechanisms for detection, notification and perhaps handling of hardware and software related faults, the individual components present in the system perform these actions separately. Knowledge about occurring faults is seldom shared between different programs and almost never on a system-wide basis. A typical system contains numerous programs that could benefit from such knowledge, include applications, middleware libraries, job schedulers, file systems, math libraries etc. The Coordinated Infrastructure for Fault Tolerant Systems (CIFTS) initiative provides the foundation necessary to enable systems to adapt to faults in a holistic manner. CIFTS achieves this through the Fault Tolerance Backplane (FTB), providing a unified management and communication framework, which can be used by any program to publish fault-related information. I currently serve as a lead developer for the Fault Tolerant Backplane. This work is being done in collaboration with scientists, researchers and academics at Argonne National Laboratory, Oak Ridge National Laboratory, Lawrence Berkeley National Laboratory, Indiana University, Ohio State University and University at Tennessee, Knoxville.



NOTE: This page does not get updated frequently and may not contain the most up-to-date information. You are welcome to contact me directly for specific information on CIFTS or anything else :)