Argonne National Laboratory

Semantics-based Distributed I/O with the ParaMEDIC Framework

TitleSemantics-based Distributed I/O with the ParaMEDIC Framework
Publication TypeConference Paper
Year of Publication2008
AuthorsBalaji, P, Feng, W, Lin, H
Conference NameHigh Performance Distributed Computing 2008 (HPDC 2008)
Date Published2008
Conference LocationBoston, MA
Other NumbersANL/MCS-P1481-0108

Many large-scale applications simultaneously rely on multiple resources for efficient execution. For example, such applications may require both large compute and storage resources; however, very few supercomputing centers can provide large quantities of both. Thus, data generated at the compute site often has to be moved to a remote storage site for either storage or visualization and analysis. Clearly, this is not an efficient model, especially when the two sites are distributed over a wide-area network. We present a framework called “ParaMEDIC: Parallel Metadata Environment for Distributed I/O and Computing” that uses application specific semantic information to convert the generated data to ordersof-magnitude smaller metadata at the compute site, transfer the metadata to the storage site, and reprocess the metadata at the storage site to regenerate the output. Specifically, ParaMEDIC trades a small amount of additional computation (in the form of data post-processing) for a potentially significant reduction in data that needs to be transferred in distributed environments.