GridFTP and Yoga

Here in the land of GridFTP development we try to be as flexible and extensible as possible. We created an IO library entirely around this idea called GlobusXIO . This has allowed us to much more easily add new protocols to GridFTP, like UDT and Phoebus. It has also allowed us to achieve greater than network speeds by use of compression.

While that has provided us with karate-man center splits type flexibility, it was not enough for us. We also created the DSI (Data Storage Interface) to GridFTP. Another dynamically loadable plug-in to GridFTP! This has allowed GridFTP servers to be front ends to complex filesystems like SRB, HPSS and further, allowed us to turn a single GridFTP server into a striped gridftp server as well as a load balancing dynamic backend server.

Ah, now we are starting to feel the burn. Yet even with 2 places for dynamically loadable modules, we did not feel plugged-in enough. We wanted to give our admins more knobs to turn, and excited contributing developers more levers to pull. So we perused our code for proper places to carve out clean abstractions. After months of discussing this over many beers we ultimately found the right spot, the ECM (event callout mumble--the plugin formerly known as ACL). We had a few cups of coffee and then wrote the interface that allowed for NeST to control disk storage reservations and allowed for interaction with CAS and other AUTHZ services. Now we are starting to get our Bird of Paradise on!

But the problem here is, these are all supported features. Thats no fun. They will all just work and if they don't one only needs to send us email for help. What about the poor hacker out there looking to tinker? What have we left for the poor souls that are not happy unless they can subversively trick a web server into being a TCP router with the long unix pipe commands? Those poor people can't deal with know and easy to use configuration options! T-Rex wants to hunt!

Well, because we care about the hacker (and because we wanted to leverage some quality tools like ssh) we created the GlobusXIO popen (pipe open) driver.

The popen driver allows users to open pipes to the standard IO of existing programs. This allows you to leverage programs much in the way that you can with UNIX pipes (actually, EXACTLY in that way).

So say you want to tar up a directory on the fly and send it via GridFTP? No problem, pipe it through /bin/tar! Want to compress a file before sending it you say? Pipe that sucker through zip. Whats that? you want your ASCII text files to appear as though they were written by The Swedish Chef ? We are here for you! Just popen /usr/games/chef and your speaking fluent Muppet baby!

We present the following examples here

Here we will show you how to do the following

Disclaimer

So first and foremost, none of this is really supported. It is a neat trick you can do with Globus GridFTP servers. It will likely work, and work every time, but it is a bit wacky to support on every system, and it could cause security headaches (popen lets authenticated users run stuff remotely). For these reasons we leave this to ambitious fun loving sysadmins to play with, and will work with other users in more supported ways to achieve their needs.

Whitelist the popen driver

For security reasons it is necessary that you explicitly allow any xio driver that will run inside the GridFTP server. You can't have remote clients inserting just any dynamically loadable module, now can you? To allow the globus-gridftp-server to load the popen driver you use the -fs-whitelist option. This tell the server which globus xio modules it can load. For our purposes here use:
    globus-gridftp-server -fs-whitelist popen,file,ordering
This will allow clients to use the default file driver (for doing regular stuff), and the popen driver for the fancy stuff here. You will also notice that the ordering driver is listed. This is there because when sending data to a pipe it needs to be in order. Often GridFTP data streams are not in order, we seek into the file to reorder them. Unfortunately we cannot seek into popen'ed programs. But fortunately we have the ordering driver which will (up until memory constraints) re-order things for us.

English to Chef

Run the globus-gridftp-server as mentioned above, for simplicity sake we will run it in anonymous mode (running it with GSI security is well documented elsewhere)
% globus-gridftp-server -p 5000 -fs-whitelist popen,file -aa 
And now run the client:
% globus-url-copy -src-fsstack popen:argv=#/usr/games/chef#/home/bresnaha/text.txt ftp://localhost:5000/x ftp://localhost:5000/home/bresnaha/trans.txt
Lets take a look at this command line. It is the normal globus-url-copy 2 url command line with 1 additional argument:
 -src-fsstack popen:argv=#/usr/games/chef#/home/bresnaha/text.txt
This argument tells the source (-src) server to set its file system xio stack (-fsstack) to use the popen (popen:) driver, with the following # separated argument list: /usr/games/chef /home/bresnaha/text.txt. What this does is, instead of sourcing data from the file in the source url, it is sourced from the results of the popen'ed program:
% /usr/games/chef /home/bresnaha/text.txt
This makes the file portion of the source url irrelevant (but it still needs to be there, thus the /x in our example).

In our example the file text.txt is sent to the destination file trans.txt.

Compress a file at the destination

Say you want to send a file, but on the remote end you have a limited amount of disk space and you need to store it in a compressed format. Here we show you how to send a file using GridFTP and on the destination end compress it using zip before it is written to disk (note: it probably makes more sense to compress it before sending it, so just take this as the example it is).

Again we start the server in a similar way, but this time we need the ordering driver whitelisted:
% globus-gridftp-server -p 5000 -fs-whitelist popen,file,ordering -aa 
And now run globus-url-copy, this time telling it to change the destination stack to use the popen driver, running /usr/bin/zip with standard input as the source and the file /home/bresnaha/text.txt.zip as the output
% globus-url-copy -dst-fsstack popen:argv=#/usr/bin/zip#/home/bresnaha/text.txt.zip#-,ordering ftp://localhost:5000/home/bresnaha/text.txt ftp://localhost:5000/y
This time notice that we used -dst-fstack instead of -src-fsstack (for obvious reasons) and that the destination url has a fake file because, this time that is taken care of by the popen program. Also notice the addition of the ordering driver. By putting it after a comma after the popen string we add it above the popen driver on the stack and thus insure that zip will see all of the data in order.

Tar on the fly!

The tar on the fly example shows how to use a pipe on both ends at the same time, and solves the problem of transferring many VERY small files at the same time. To do this start, again, by running the server:
% globus-gridftp-server -p 5000 -fs-whitelist popen,file,ordering -aa
Now all we need to do is tell the source server to tar it up and the destination to untar it:
% globus-url-copy  -src-fsstack popen:argv=#/bin/tar#-C#/home/bresnaha#-cf#-#TarIt  -dst-fsstack popen:argv=#/bin/tar#-xf#-#-C#/home/bresnaha/UnTarIt ftp://localhost:5000/x ftp://localhost:5000/y
This time you will notice that both the source and destination files are bogus. This is because the tar program has already been told where to the data comes from via -src-fstack and where it goes via -dst-fstack.

The command lines tend to get long and ugly, and they are not for the faint of heart, but for a sysadmin looking to experience, and willing to write small scripts around globus-url-copy for their uses, this can be an incredibly power extension.

Try it out with any other program that can handle pipes!