ParaMEDIC: Parallel Metadata Environment for Distributed I/O and Computing
|Title||ParaMEDIC: Parallel Metadata Environment for Distributed I/O and Computing|
|Publication Type||Conference Proceedings|
|Year of Publication||2007|
|Authors||Balaji, P, Feng, W, Archuleta, J, Lin, H, Kettimuthu, R, Thakur, R|
|Conference Name||13th ACM SIGPLAN Principles and Practice of Parallel Programming (PPoPP 2008)|
|Conference Location||Salt Lake City, UT|
BLAST is a widely used software toolkit for genomic sequence search. mpiBLAST is a freely available, open-source parallelization of BLAST that uses database segmentation to allow different worker processors to search (in parallel) unique segments of the database. After searching, the workers write their output to a filesystem. While mpiBLAST has been shown to achieve high performance in clusters with fast local filesystems, its I/O processing remains a concern for scalability, especially in systems having limited I/O capabilities such as those using distributed filesystems spread across a wide-area network.
Thus, we present ParaMEDIC—an environment that decouples computation and I/O in distributed environments for applications such as mpiBLAST and dramatically reduces I/O overhead through metadata processing. Specifically, for mpiBLAST, ParaMEDIC partitions worker processes into compute and I/O workers. Compute workers, instead of directly writing output to the distributed filesystem, convert their output to metadata and send it to I/O workers. I/O workers, which physically reside closer to the actual storage, then process this metadata to re-create the actual out-put and write it to the filesystem. This approach allows ParaMEDIC to cut down on the I/O time, thus accelerating mpiBLAST by as much as 25-fold in some cases.