Argonne National Laboratory

Cluster-to-Cluster Data Transfer with Data Compression over Wide-Area Networks

TitleCluster-to-Cluster Data Transfer with Data Compression over Wide-Area Networks
Publication TypeJournal Article
Year of Publication2014
AuthorsJung, E, Kettimuthu, R, Vishwanath, V
JournalJournal of Parallel and Distributed Computing
Date Published09/2014
Other NumbersANL/MCS-P5203-0914
AbstractThe recent emergence of ultra high-speed networks up to 100 Gb/s has posed numerous challenges and has led to many investigations on efficient protocols to saturate 100 Gb/s links. However, end-to-end data transfers involve many components, not only protocols, affecting overall transfer performance. These components include disk I/O subsystem, additional computation associated with data streams, and network adapters. For example, achievable bandwidth by TCP may not be implementable if disk I/O or CPU becomes a bottleneck in end-to-end data transfer. In this paper, we first model all the system components involved in end-to-end data transfer as a graph. We then formulate the problem whose goal is to achieve maximum data transfer throughput using parallel data flows. We also propose a variable data flow GridFTP XIO stack to improve data transfer with data compression. Our contributions lie in how to optimize data transfers considering all the system components involved rather than in accurately modeling all the system components involved. Our proposed formulations and solutions are evaluated through experiments on the ESnet 100G testbed and a wide-area cluster-to-cluster testbed. The experimental results on ESnet 100G testbed show that our approach is several times faster than Globus Online – 8x faster for datasets with many 10MB files and 3-4x faster for other datasets of larger size files. The experimental results on the cluster-to-cluster testbed show that our variable data flow approach is up to 4x faster than a normal cluster data transfer.