GridFTP and Yoga
Here in the land of GridFTP development we try to be as flexible
and extensible as possible. We created an IO library entirely
around this idea called
GlobusXIO
. This has allowed us to much more easily add new
protocols to GridFTP, like
UDT
and
Phoebus.
It has also allowed us to achieve greater than network speeds by use
of
compression.
While that has provided us with karate-man center splits type flexibility,
it was not enough for us. We also created the DSI (Data Storage
Interface) to GridFTP. Another dynamically loadable plug-in to GridFTP!
This has allowed GridFTP servers to be front ends
to complex filesystems like
SRB,
HPSS
and further, allowed us to turn a single GridFTP server into a
striped
gridftp server as well as a
load balancing dynamic backend server.
Ah, now we are starting to feel the burn. Yet even with 2 places for
dynamically loadable modules, we did not feel plugged-in enough. We wanted
to give our admins more knobs to turn, and excited contributing developers
more levers to pull. So we perused our code for proper places to carve out
clean abstractions. After months of discussing this over many beers we
ultimately found the right spot, the ECM
(event callout mumble--the plugin formerly known as ACL). We had a few cups
of coffee and then wrote the interface that allowed for
NeST
to control disk storage reservations and allowed for interaction with
CAS
and other
AUTHZ
services.
Now we are starting to get our
Bird of Paradise on!
But the problem here is, these are all supported features. Thats no fun.
They will all just work and if they don't one only needs to send us email
for help. What about the poor hacker out there looking to tinker? What
have we left for the poor souls that are not happy unless they can
subversively
trick a web server into being a TCP router with the long unix pipe
commands? Those poor people can't deal with know and easy to use
configuration options! T-Rex wants to hunt!
Well, because we care about the hacker (and because we wanted to leverage
some quality tools like ssh) we created the
GlobusXIO
popen (pipe open) driver.
The popen driver allows users to open pipes to the standard IO of
existing programs. This allows you to leverage programs much in
the way that you can with UNIX pipes (actually, EXACTLY in that way).
So say you want to tar up a directory on the fly and send it via
GridFTP? No problem, pipe it through /bin/tar! Want to compress
a file before sending it you say? Pipe that sucker through zip.
Whats that? you want your ASCII text files to appear as though they
were written by
The Swedish Chef
?
We are here for you! Just popen /usr/games/chef and your speaking fluent
Muppet baby!
We present the following examples here
Here we will show you how to do the following
- translate English text to foreign languages (at the source)
- compress a file (at the destination)
- tar a directory on the fly (both sides)
Disclaimer
So first and foremost, none of this is really supported. It is a neat trick
you can do with
Globus GridFTP
servers. It will likely work, and work every time, but it is a bit wacky
to support on every system, and it could cause security headaches (popen
lets authenticated users run stuff remotely). For these reasons we
leave this to ambitious fun loving sysadmins to play with, and will work
with other users in more supported ways to achieve their needs.
Whitelist the popen driver
For security reasons it is necessary that you explicitly allow any xio
driver that will run inside the GridFTP server. You can't have remote
clients inserting just any dynamically loadable module, now can you?
To allow the globus-gridftp-server to load the popen driver you use the
-fs-whitelist option. This tell the server which globus xio modules it
can load. For our purposes here use:
globus-gridftp-server -fs-whitelist popen,file,ordering
This will allow clients to use the default file driver (for doing
regular stuff), and the popen driver for the fancy stuff here.
You will also notice that the ordering driver is listed. This
is there because when sending data to a pipe it needs to be in order.
Often GridFTP data streams are not in order, we seek into the file
to reorder them. Unfortunately we cannot seek into popen'ed programs.
But fortunately we have the ordering driver which will (up until
memory constraints) re-order things for us.
English to Chef
Run the globus-gridftp-server as mentioned above, for simplicity sake we
will run it in anonymous mode (running it with GSI security is well
documented elsewhere)
% globus-gridftp-server -p 5000 -fs-whitelist popen,file -aa
And now run the client:
% globus-url-copy -src-fsstack popen:argv=#/usr/games/chef#/home/bresnaha/text.txt ftp://localhost:5000/x ftp://localhost:5000/home/bresnaha/trans.txt
Lets take a look at this command line. It is the normal globus-url-copy
2 url command line with 1 additional argument:
-src-fsstack popen:argv=#/usr/games/chef#/home/bresnaha/text.txt
This argument tells the source (-src) server to set its
file system xio stack (-fsstack) to use the popen (popen:) driver, with the
following # separated argument list: /usr/games/chef /home/bresnaha/text.txt.
What
this does is, instead of sourcing data from the file in the source url, it
is sourced from the results of the popen'ed program:
% /usr/games/chef /home/bresnaha/text.txt
This makes the file portion of the source url irrelevant (but it still
needs to be there, thus the /x in our example).
In our example the file
text.txt is sent to the destination file
trans.txt.
Compress a file at the destination
Say you want to send a file, but on the remote end you have a limited
amount of disk space and you need to store it in a compressed format.
Here we show you how to send a file using GridFTP and on the destination
end compress it using zip before it is written to disk (note: it
probably makes more sense to compress it before sending it, so just
take this as the example it is).
Again we start the server in a similar way, but this time we need
the ordering driver whitelisted:
% globus-gridftp-server -p 5000 -fs-whitelist popen,file,ordering -aa
And now run globus-url-copy, this time telling it to change the destination
stack to use the popen driver, running /usr/bin/zip with standard input
as the source and the file /home/bresnaha/text.txt.zip as the output
% globus-url-copy -dst-fsstack popen:argv=#/usr/bin/zip#/home/bresnaha/text.txt.zip#-,ordering ftp://localhost:5000/home/bresnaha/text.txt ftp://localhost:5000/y
This time notice that we used -dst-fstack instead of -src-fsstack (for
obvious reasons) and that the destination url has a fake file because,
this time that is taken care of by the popen program. Also notice the
addition of the ordering driver. By putting it after a comma after the
popen string we add it above the popen driver on the stack and thus insure
that zip will see all of the data in order.
Tar on the fly!
The tar on the fly example shows how to use a pipe on both ends at the
same time, and solves the problem of transferring[D[D[D[D[D[f many VERY small files
at the same time. To do this start, again, by running the server:
% globus-gridftp-server -p 5000 -fs-whitelist popen,file,ordering -aa
Now all we need to do is tell the source server to tar it up and the
destination to untar it:
% globus-url-copy -src-fsstack popen:argv=#/bin/tar#-C#/home/bresnaha#-cf#-#TarIt -dst-fsstack popen:argv=#/bin/tar#-xf#-#-C#/home/bresnaha/UnTarIt ftp://localhost:5000/x ftp://localhost:5000/y
This time you will notice that both the source and destination files are bogus.
This is because the tar program has already been told where to the data
comes from via -src-fstack and where it goes via -dst-fstack.
The command lines tend to get long and ugly, and they are not for
the faint of heart, but for a sysadmin looking to experience, and willing
to write small scripts around globus-url-copy for their uses, this can
be an incredibly power extension.
Try it out with any other program that can handle pipes!