Argonne National Laboratory

FusionFS: Towards Supporting Data-Intensive Scientific Applications on Extreme-Scale High-Performance Computing Systems

TitleFusionFS: Towards Supporting Data-Intensive Scientific Applications on Extreme-Scale High-Performance Computing Systems
Publication TypeConference Paper
Year of Publication2014
AuthorsZhao, D, Zhang, Z, Zhou, X, Li, T, Kimpe, D, Carns, PH, Ross, RB, Raicu, I
Conference Name2014 IEEE international Conference on Big Data
Other NumbersANL/MCS-P5191-0814
AbstractState-of-the-art yet decades old architecture of high performance computing (HPC) systems has its compute and storage resources separated. It has shown limits for today’s data- intensive scientific applications, because every I/O needs to be transferred via the network between the compute and storage cliques. This paper proposes a distributed storage layer local to the compute nodes, which is responsible for most of the I/O operations and saves extreme amount of data movement between compute and storage resources. We have designed and implemented a system prototype of such architecture–the FusionFS distributed file system–to support metadata-intensive and write-intensive operations, both of which are critical to the I/O performance of scientific applications. FusionFS has been deployed and evaluated on up to 16K compute nodes in an IBM Blue Gene/P supercomputer, showing more than an order of magnitude performance improvement over other popular file systems such as GPFS, PVFS, and HDFS.  
PDFhttp://www.mcs.anl.gov/papers/P5191-0814.pdf