Argonne National Laboratory

Global-Scale Distributed I/O with ParaMEDIC

TitleGlobal-Scale Distributed I/O with ParaMEDIC
Publication TypeJournal Article
Year of Publication2010
AuthorsBalaji, P, Feng, W, Lin, H, Archuleta, J, Matsuoka, S, Warren, A, Setubal, J, Lusk, EL, Thakur, R, Foster, IT, Katz, DS, Jha, S, Shinpaugh, K, Coghlan, SM, Reed, D
JournalConcurrency and Computat.: Prac. Exper.
Date Published10/2010

Achieving high performance for distributed I/O on a wide-area network continues to be an elusive holy grail. Despite enhancements in network hardware as well as software stacks, achieving high-performance remains a challenge. In this paper, our worldwide team took a completely new and non-traditional approach to distributed I/O, called ParaMEDIC: Parallel Metadata Environment for Distributed I/O and Computing, by utilizing application-specific transformation of data to orders of magnitude smaller metadata before performing the actual I/O. Specifically, this paper details our experiences in deploying a large-scale system to facilitate the discovery of missing genes and constructing a genome similarity tree by encapsulating the mpiBLAST sequence-search algorithm into ParaMEDIC. The overall project involved nine computational sites spread across the U.S. and generated more than a petabyte of data that was \"teleported\" to a large-scale facility in Tokyo for storage.