|Title||FusionFS: Towards Supporting Data-Intensive Scientific Applications on Extreme-Scale High-Performance Computing Systems |
|Publication Type||Conference Paper |
|Year of Publication||2014 |
|Authors||Zhao, D, Zhang, Z, Zhou, X, Li, T, Kimpe, D, Carns, PH, Ross, RB, Raicu, I |
|Conference Name||2014 IEEE international Conference on Big Data |
|Other Numbers||ANL/MCS-P5191-0814 |
|Abstract||State-of-the-art yet decades old architecture of high performance computing (HPC) systems has its compute and storage resources separated. It has shown limits for today’s data- intensive scientific applications, because every I/O needs to be transferred via the network between the compute and storage cliques. This paper proposes a distributed storage layer local to the compute nodes, which is responsible for most of the I/O operations and saves extreme amount of data movement between compute and storage resources. We have designed and implemented a system prototype of such architecture–the FusionFS distributed file system–to support metadata-intensive and write-intensive operations, both of which are critical to the I/O performance of scientific applications. FusionFS has been deployed and evaluated on up to 16K compute nodes in an IBM Blue Gene/P supercomputer, showing more than an order of magnitude performance improvement over other popular file systems such as GPFS, PVFS, and HDFS.