|Title||Cluster-to-Cluster Data Transfer with Data Compression over Wide-Area Networks |
|Publication Type||Journal Article |
|Year of Publication||2014 |
|Authors||Jung, E, Kettimuthu, R, Vishwanath, V |
|Journal||Journal of Parallel and Distributed Computing |
|Date Published||09/2014 |
|Other Numbers||ANL/MCS-P5203-0914 |
|Abstract||The recent emergence of ultra high-speed networks up to 100 Gb/s has posed numerous challenges and has led to many investigations on efficient protocols to saturate 100 Gb/s links. However, end-to-end data transfers involve many components, not only protocols, affecting overall transfer performance. These components include disk I/O subsystem, additional computation associated with data streams, and network adapters. For example, achievable bandwidth by TCP may not be implementable if disk I/O or CPU becomes a bottleneck in end-to-end data transfer. In this paper, we first model all the system components involved in end-to-end data transfer as a graph. We then formulate the problem whose goal is to achieve maximum data transfer throughput using parallel data flows. We also propose a variable data flow GridFTP XIO stack to improve data transfer with data compression. Our contributions lie in how to optimize data transfers considering all the system components involved rather than in accurately modeling all the system components involved. Our proposed formulations and solutions are evaluated through experiments on the ESnet 100G testbed and a wide-area cluster-to-cluster testbed. The experimental results on ESnet 100G testbed show that our approach is several times faster than Globus Online – 8x faster for datasets with many 10MB files and 3-4x faster for other datasets of larger size files. The experimental results on the cluster-to-cluster testbed show that our variable data flow approach is up to 4x faster than a normal cluster data transfer.