ParaMEDIC: Parallel Metadata Environment for Distributed I/O and Computing

TitleParaMEDIC: Parallel Metadata Environment for Distributed I/O and Computing
Publication TypeConference Paper
Year of Publication2007
AuthorsBalaji, P, Feng, W, Archuleta, J, Lin, H, Kettimuthu, R, Thakur, R
Conference Name13th ACM SIGPLAN Principles and Practice of Parallel Programming (PPoPP 2008)
Conference LocationSalt Lake City, UT
Other NumbersANL/MCS-P1452-0807
AbstractBLAST is a widely used software toolkit for genomic sequence search. mpiBLAST is a freely available, open-source parallelization of BLAST that uses database segmentation to allow different worker processors to search (in parallel) unique segments of the database. After searching, the workers write their output to a filesystem. While mpiBLAST has been shown to achieve high performance in clusters with fast local filesystems, its I/O processing remains a concern for scalability, especially in systems having limited I/O capabilities such as those using distributed filesystems spread across a wide-area network. Thus, we present ParaMEDIC—an environment that decouples computation and I/O in distributed environments for applications such as mpiBLAST and dramatically reduces I/O overhead through metadata processing. Specifically, for mpiBLAST, ParaMEDIC partitions worker processes into compute and I/O workers. Compute workers, instead of directly writing output to the distributed filesystem, convert their output to metadata and send it to I/O workers. I/O workers, which physically reside closer to the actual storage, then process this metadata to re-create the actual out-put and write it to the filesystem. This approach allows ParaMEDIC to cut down on the I/O time, thus accelerating mpiBLAST by as much as 25-fold in some cases.