Seminars & Events
Mathematics and Computer Science Division
"Supporting Integrated Data Services: A New Challenge for High-Performance Computing"
DATE: January 11, 2012 to January 11, 2012
TIME: 10:30 AM - 11:30 AM
SPEAKER: Ron Oldfield, Principal Member of Technical Staff, Sandia National Laboratories
LOCATION: Building 240, Conference Room, 4301, Argonne National Laboratory
HOST: Phil Carns
Description:
Over the past several years, there has been increasing interest in injecting a layer of compute resources between a high-performance computing application and the end storage devices. For some projects, the objective is to present the parallel file system with a reduced set of clients, making it easier for file-system vendors to support extreme-scale systems. In other cases, the objective is to use these resources as "staging areas" to aggregate data or cache bursty I/O operations, thus improving the "effective" I/O throughput seen by the application. Still others want to use these staging areas for performing "in-situ" analysis on data in-transit between the application and the storage system. To simplify our discussion, we adopt the general term "Integrated Data Services" to represent these use-cases for HPC compute resources.
Although there is great interest in providing integrated data services for HPC platforms, a number of issues exist that hinder these efforts on today's platforms:
- There is no standard, portable, API to support data services across platforms.
- There is no scheduler or runtime support for dynamic data services.
- Security models sometimes hinder the use of data services.
- Very little has been done to address resilience issues created by data services.
In this talk, I describe R&D efforts at Sandia to address some of these issues. In particular, I will introduce the Network Scalable Service Interface (Nessie), a parallel remote-procedure call API designed to enable the rapid development of data services on a variety of HPC platforms. I will also briefly describe a number of data services created using Nessie, including a PnetCDF staging service, an SQL proxy service, and an in-transit analysis service for the CTH shock-physics code. Finally, I will discuss new research directions, including an investigation of how to address resilience issues created by the use of data services.
------- Bio ------
Ron A. Oldfield is a principal member of the technical staff at Sandia National Laboratories in Albuquerque, NM. He received the B.Sc. in computer science from the University of New Mexico in 1993. From 1993 to 1997, he worked in the computational sciences department of Sandia National Laboratories, where he specialized in seismic research and parallel I/O. He was the primary developer for the GONII-SSD (Gas and Oil National Information Infrastructure--Synthetic Seismic Dataset) project and a co-developer for the R&D 100 award winning project "Salvo", a project to develop a 3D finite-difference prestack-depth migration algorithm for massively parallel architectures. From 1997 to 2003 he attended graduate school at Dartmouth college and received his Ph.D. in June, 2003. In September of 2003, he returned to Sandia to work in the Scalable Computing Systems department. He has been the PI of a number of I/O, resilience, and systems simulation projects; and is currently project manager for the ASC/CSSE Scalable I/O Research at Sandia. His research interests include parallel and distributed computing, parallel I/O, resilience, and performance modeling.
Save the event to your calendar [schedule.ics]
