Semantics-based Distributed I/O with the ParaMEDIC Framework
|Title||Semantics-based Distributed I/O with the ParaMEDIC Framework|
|Publication Type||Conference Paper|
|Year of Publication||2008|
|Authors||Balaji, P, Feng, W, Lin, H|
|Conference Name||High Performance Distributed Computing 2008 (HPDC 2008)|
|Conference Location||Boston, MA|
Many large-scale applications simultaneously rely on multiple resources for efficient execution. For example, such applications may require both large compute and storage resources; however, very few supercomputing centers can provide large quantities of both. Thus, data generated at the compute site often has to be moved to a remote storage site for either storage or visualization and analysis. Clearly, this is not an efficient model, especially when the two sites are distributed over a wide-area network. We present a framework called “ParaMEDIC: Parallel Metadata Environment for Distributed I/O and Computing” that uses application specific semantic information to convert the generated data to ordersof-magnitude smaller metadata at the compute site, transfer the metadata to the storage site, and reprocess the metadata at the storage site to regenerate the output. Specifically, ParaMEDIC trades a small amount of additional computation (in the form of data post-processing) for a potentially significant reduction in data that needs to be transferred in distributed environments.