Globus XIO Pipe Open Driver: Enabling GridFTP to Leverage Standard Unix Tools

Publication TypeConference Paper
Year of Publication2011
AuthorsKettimuthu, R, Link, S, Bresnahan, J, Link, M, Foster, IT
Conference NameProc. TeraGrid 2011
Date Published07/2011
Conference LocationAustin, Texas
Other NumbersANL/MCS-P1899-0611

Scientific research of all disciplines unavoidably creates substantially large volumes of data throughout the process of discovery, analysis and conclusion. Given the necessity for data sharing and data relocation, members of the scientific community are often faced with a productivity loss which correlates with the time cost incurred during the data transfer process. GridFTP protocol was developed to improve this situation by addressing the performance, reliability and security limitations of standard FTP and other commonly used data movement tools such as SCP. The Globus implementation of GridFTP is widely used to rapidly and reliably move data between geographically distributed systems. Traditionally, GridFTP performs well for datasets containing large files. When the data is partitioned into many small files, it suffers from lower transfer rates. Though the pipelining and concurrency solution in GridFTP provides improved transfer rates for lots of small files datasets, these solutions cannot be applied in environments that have strict firewall rules. In such scenarios, tarring up the files in a dataset on the fly will help. In certain scenarios, compression is desired, in other cases, a checksum of the files after they are written to disk, is desired. There are robust system tools in Unix that perform these tasks (tar, compress, checksum, etc.). In this paper, we present the Globus XIO Pipe Open Driver (Popen) that enables GridFTP to leverage the standard Unix tools to perform certain tasks. We show how this driver is used in GridFTP to provide a number of useful features. We demonstrate the effectiveness of this functionality through an experimental study.