|Publication Type||Conference Paper|
|Year of Publication||2007|
|Authors||Bresnahan, J, Link, M, Kettimuthu, R, Fraser, D, Foster, IT|
|Conference Name||TeraGrid '07|
|Conference Location||Madison, WI|
GridFTP is an exceptionally fast transfer protocol for large volumes of data. Implementations of it are widely deployed and used on well-connected Grid environments such as those of the TeraGrid because of its ability to scale to network speeds. However, when the data is partitioned into many small files instead of few large files, it suffers from lower transfer rates. The latency between the serialized transfer requests of each file directly detracts from the amount of time data pathways are active, thus lowering achieved throughput. Further, when a data pathway is inactive, the TCP window closes, and TCP must go through the slow-start algorithm. The performance penalty can be severe. This situation is known as the "lots of small files" problem. In this paper we introduce a solution to this problem. This solution, called pipelining, allows many transfer requests while a data transfer is in progress. We present an implementation and performance study of the pipelining solution.