GridFTP is a very fast transfer protocol for large volumes of data. However when the data is partitioned into small files it historically suffers from low transfer rates due to the latency between transfer requests. This is known as the 'lots of small files' problem.
Pipelining is desinged to solve this problem. We modified GridFTP to allow many transfer requests to be outstanding at once. Instead of waiting for one request to complete before allowing the second request to be sent from the client we gain great performance increases with LOSF. The latency between requests is hidden in the time it takes to transfer the previous file. By the time the first file has completed the second request is queued up in the server ready to start. The graphs here show the performanc increases.
Our experiments transfer 1 GB of data partitioned into files of equal sizes. We measures the bandwidth achieved as a function of the file size. The first experiment was done on the UC TeraGrid LAN. The results are shown below.
The red line shows the results with pipelining and the green line without pipelining. With pipelining we can hold peak transfer rates for much smaller files. The throughput does not tail off until files become smaller than 100KB.
The experiment is performed on the wide area between a UC teragrid node and a SDSC teragrid node.