Project mission
Design, build, and distribute a scalable, unified high-end computing I/O forwarding software layer that would be adopted and supported by DOE Office of Science and NNSA.
Goals
- Provide function shipping at the file system interface level (without requiring middleware) that enables asynchronous coalescing and I/O without jeopardizing determinism for computation
- Offload file system function from simple or full OS client processes to a variety of targets, from another core or hardware on the same system to an I/O node on a conventional cluster or a service node on a leadership class system
- Reduce the number of file system operations/clients that the parallel file system sees
- Support any/all parallel file system solutions
- Integrate with MPI-IO and any hardware features designed to support efficient parallel I/O
Once developed, this framework will be available online under an open-source license. We will encourage its adoption both in production and for further research into I/O forwarding approaches.
The application I/O calls get translated to ZOIDFS—an efficient parallel file I/O API that we defined. ZOIDFS calls are forwarded over an interconnecting network to the I/O nodes. I/O nodes run ZOIDFS server code that efficiently translates the calls to the native API of the underlying parallel filesystem.
Focus areas
- portability across networks: we are developing ports for IBM Blue Gene, Cray XT, Roadrunner, and Linux clusters.
- support for multiple filesystems: we are developing support for PVFS, Lustre, PanFS, and standard POSIX API, and will also provide a pluggable infrastructure for other filesystems.
- hooking to applications: MPI-IO via ROMIO, POSIX via the SYSIO library, modified GNU libc, or FUSE.
- performance: this being a high-end computing project, performance is of paramount importance, so we are working hard to ensure at every step that the software we develop does not introduce unnecessary overheads.
- security: we are designing and implementing a general, configurable security mechanism at the data access level. This mechanism will be scalable, fine grained, object-based, and capable of using externally generated capability keys sharable across parallel tasks.
- testing: because we are building a production-quality software framework, testing and hardening are critical tasks. Testing will be conducted in three general categories: functionality testing, security testing, and performance testing.
Leveraging existing projects
- ZOIDFS is a filesystem-independent protocol for I/O forwarding, somewhat similar to NFSv3. Highlights include stateless servers, file handles (rather than file descriptors) that can be freely exchanged between the clients, and maximally flexible read/write operations that can deal with multiple file regions and memory buffers in one call.
- SYSIO is a flexible, user-space implementation of the virtual file system abstraction, developed for Cray XT. It supplies all of the normal POSIX calls to the application and assembles the local name space on the compute node based on global configuration information. We are augmenting SYSIO for this project to use our I/O forwarding framework underneath for access to file systems.
- ROMIO is the foundation of most MPI-IO implementations in use. We are implementing support for the ZOIDFS client interface in ROMIO, so that applications using MPI-IO can take advantage of our I/O forwarding infrastructure without modifications.
- BMI is a portable communication layer developed for PVFS, which has support for MX, sockets, Portals, and others. It serves as a communication interface between the compute nodes and the I/O nodes.
- ZOID (the ZeptoOS I/O Daemon) is an extensible, high-performance function call forwarding infrastructure developed for IBM Blue Gene within the ZeptoOS project. ZOID is used as a low-level transfer layer for the Blue Gene driver of BMI.
This project is supported by the DOE Office of Science and NNSA. Support is also provided by the National Science Foundation under grant numbers 0937928 and 0724599.