Reliable File Transfer Service

This is a old prototype and is not suppored any more. Please don't look at this. RFT is now part of Globus Toolkit. Please look at globus web site for more details.

Reliable File Transfer Service

Description:

Reliable File Transfer Service is a service that allows byte streams to be transferred in a reliable manner. Reliability, in this context, means that problems of less than a certain, user defined magnitude are dealt with automatically. i.e. problems like dropped connections, machine reboots, temporary network outages, etc are dealt with automatically (usually via retry) until they either resume or meet some "ultimate failure" condition.

A Power Point Presentation of RFT given at University of Chicago Tech Talk by Ravi Madduri can be found here

Link to API

WSDL definition for RFT

reliable_transfer_service.wsdl
reliable_transfer_port_type.wsdl
reliable_transfer_bindings.wsdl
reliable_transfer_types.wsdl

General Architecture:

The General Architecture of Reliable File Transfer Service can be found here (in powerpoint)

The Reliable File Transfer Service is built on the top of existing GridFTP client libraries.So it naturally inherits the features handled by GridFTP via its performance and restart support, remote problems like network outages,remote reboots, remote servers falling over etc. However, the client needs to remain active at all times and should long as the transfer takes to finish. Loss of the client state machine requires a manual restart from scratch which necessitates the need for a non-user based service like Reliable File Transfer Service to which user can submit a request for transferring a set of files and free his/her local desktop or laptop. The transfer state is stored in a persistent manner so that in case of any above mentioned failures the transfer is not started from scratch but from the last restart marker recorded for that transfer.

The Reliable File Transfer (RFT) consists of following components
1.The Reliable File Transfer Service, which accepts the transfer requests is written in Java. The Transfer Service is multi threaded service listening at a pre-configured port.The service communicates to external clients that submit transfer requests via XML/SOAP.This design decision was made in order to make the Service independent of the programming language in which the client is written. The service can accept requests from clients written in any language (C,Python to name a few ) as long as they understand XML.The Service implements the reliability part of RFT by keeping track of the progress of all the transfers.It accepts Transfer requests from Control Client GUI along with the concurrency attribute(number of transfers that should be done simultaneously) and stores them in a Database. The service invokes the transfer client with necessary arguments to start a third party FTP transfer.The service in itself is highly configurable. The failure recovery mechanism is described below.
Failure Recovery Mechanism is the mechanism by which RFT recovers from failures like server crashes and network outages. The service after forking off the transfer client monitors the transfer by waiting on the transfer client.If the client returns a fatal error (e.g when the source URL or destination URLs are not valid among other things ) which means the transfer is impossible to do then the service will not restart the failed transfer but if the client returns a non fatal error (which can be anything from a crashed server to network outage ) the service will restart the transfer.The transfer is started from the point where it failed before.The number of times the service tries to restart a transfer can be configured before starting the service.It can be a finite number or you can configure the service to try for ever.The service also connects to Netlogger service to publish the Performance Markers which is archived. This information can be helpful in analyzing the performance of various transfers and may be helpful for a layer above RFT to make intelligent decisions based on the archived data.
2. Transfer Request Control GUI is an example GUI written in Java.It is used to submit transfer requests to the service and it also receives status updates from the service and displays them on GUI for user to know the present state of the transfer. We have written a event framework for dynamically updating the status of transfers on GUI.The user can submit a bunch of transfers as a request to the service via GUI.The user can also specify how many files he wants to transfer at a time (concurrency) and a friendly name to represent the set of transfers.One more important feature of Client is that you can connect to different Transfer services on different machines, dynamically, and monitor the requests submitted to those services.The user can kill his client and bring it up later to see the status of the transfers.The control client also allows the user to cancel a transfer at any point.We are currently adding some more functionality to Request GUI like canceling whole Transfer Request and cleaning up the database among other things.
3. Transfer Client, is a C binary built on top of GridFTP libraries that actually performs transfers.The transfer client is forked off by the service with required information to do a Third Party FTP transfer.As the transfer proceeds the transfer client connects to the database and stores the Restart Markers(check points that are given out by servers which indicate how much data has already been transferred) in the database for every 5 seconds. It is from this restart markers that the service builds the state of a transfer after a crash or a network outage.The next time the service tries to resume a failed transfer the transfer client checks for the latest restart marker in the database and transfers the remaining part instead of starting the transfer again from scratch. The transfer client also publishes Performance Markers to the Netlogger service.
4.Direct Controls GUI This GUI is used to adjust the number of parallel streams for an active transfer.The user can also adjust TCP buffer size for the active transfer.The way it works is it stops the current transfer,breaks the connection and then opens new connection with the new parallel streams and/or TCP buffer size and resumes the transfer from the point where it stopped it before. The user can play around and find the optimum values for TCP buffer size and number of parallel streams.This is written in Java.
5. Performance Graph GUI This GUI displays the Performance graph between Throughput and time.Using this GUI the user can visually monitor the progress of an active transfer and also get the performance information of both active and transfers that are done.The Performance Graph GUI is written in Java.The user can change the scale in the graph and also zoom into a selected portion of the graph for a closer look.
6.Netlogger Service The transfer client publishes the performance markers to Netlogger service which archives the data and also provides it to Performance Graph GUI which in turn converts the data into performance graphs.The service needs to be configured to use Netlogger service so that the transfer client publishes data to it.
7.Database,to store the state of all the transfers and retrieve them in case of a failure.We are currently using PostgreSQL as our database.

If you think you are bogged down with more details and prefer a picture instead you can find it.here