Parallel I/O Optimization for Scalable Deep Learning
|Title||Parallel I/O Optimization for Scalable Deep Learning|
|Year of Publication||2017|
|Authors||Pumma, S, Si, M, Feng, W, Balaji, P|
As deep learning systems continue to grow in importance, several researchers have been analyzing approaches to make such systems efficient and scalable on high-performance computing platforms. As computational parallelism increases, however, data I/O becomes the major bottleneck limiting the overall system scalability. In this paper, we continue our efforts to improve LMDB: the I/O subsystem of the Caffe deep learning framework. In a previous paper, we presented LMDBIO—an optimized I/O plugin for Caffe that takes into account the data access pattern of Caffe in order to vastly improve I/O performance. Nevertheless, LMDBIO’s optimizations are limited to intranode performance, and LMDBIO does little to minimize the I/O inefficiencies in distributed-memory environments. In this paper, we propose LMDBIO-2.0, an enhanced version of LMDBIO that optimizes the I/O access of Caffe in distributedmemory environments. We present several sophisticated data I/O techniques that allow for significant improvement in such environments. Our experimental results show that LMDBIO-2.0 can improve the overall execution time of Caffe by more then 30-fold compared with LMDB and by 2-fold compared with LMDBIO.