Global-Scale Distributed I/O with ParaMEDIC
|Title||Global-Scale Distributed I/O with ParaMEDIC|
|Publication Type||Journal Article|
|Year of Publication||2010|
|Authors||Balaji, P, Feng, W, Lin, H, Archuleta, J, Matsuoka, S, Warren, A, Setubal, J, Lusk, EL, Thakur, R, Foster, IT, Katz, DS, Jha, S, Shinpaugh, K, Coghlan, SM, Reed, D|
|Journal||Concurrency and Computat.: Prac. Exper.|
Achieving high performance for distributed I/O on a wide-area network continues to be an elusive holy grail. Despite enhancements in network hardware as well as software stacks, achieving high-performance remains a challenge. In this paper, our worldwide team took a completely new and non-traditional approach to distributed I/O, called ParaMEDIC: Parallel Metadata Environment for Distributed I/O and Computing, by utilizing application-specific transformation of data to orders of magnitude smaller metadata before performing the actual I/O. Specifically, this paper details our experiences in deploying a large-scale system to facilitate the discovery of missing genes and constructing a genome similarity tree by encapsulating the mpiBLAST sequence-search algorithm into ParaMEDIC. The overall project involved nine computational sites spread across the U.S. and generated more than a petabyte of data that was \"teleported\" to a large-scale facility in Tokyo for storage.