3rd CEDPS Data Challenge: Single Globus.org Instance, Multiple Users, Multiple Sites

Table of Contents

Key Findings
Overview
Performance Characteristics
Code Defects
Troubleshooting
Downloads
Run #12: 11 Users, 12 Sites, 100,000 Files
Overview
NERSC Details
OLCF Details
PADS Details
abe, ARCS, bigred, frost, lonestar, queenbee, ranger, and steele Details
Run #11: 11 Users, 12 Sites, 100,000 Files
Run #10: 11 Users, 12 Sites, 1,375 Files
Run #9: 11 Users, 12 Sites, 1,375 Files
Run #8: 11 Users, 12 Sites, 1,375 Files
Run #7: 10 Users, 11 Sites, 1,000 Files
Run #6: 10 Users, 11 Sites, 1,000 Files
Run #5: 10 Users, 11 Sites, 1,000 Files
Run #4: 10 Users, 2 Sites, 100,000 Files
Run #3: 6 Users, 2 Sites, 60,000 Files
Run #2: 3 Users, 2 Sites, 30,000 Files
Run #1: 1 User, 2 Sites, 1,000 Files

Key Findings

    Overview
  1. The 3rd CEDPS challenge requirements (multiple users transferring 100,000 200MB files across multiple sites within 3 days using a single Globus.org instance) have been met as outlined in Run #12. Early work on this challenge included repeating transfers across the same infrastructure while varying the number of users (Runs #1-4). Work midway included transferring a modest number of files across multiple sites and examining event data (Runs #5-10). The work concluded by varying transfer workloads, with some users sending high-volume requests at the same time as others sending modest requests (Runs #11-12); improvements to the Globus.org event reporting mechanism between Runs #10 and #11 enabled high-volume event data to be examined. Key findings from this challenge are organized under three topics: performance characteristics, code defects, and troubleshooting:

  2. Performance Characteristics
  3. The performance any given user receives from Globus.org is highly dependent on some combination of the load on the Globus.org server and/or the underlying infrastructure (endpoints, network, etc.) Consider the differences in elapsed time experienced by various Globus.org users transferring 10,000 files over the same infrastructure during Runs #1-4. The user-specific comparisons (first user in the run to submit a transfer request, last user to submit request) underscore that individual users experience markedly different performance depending on current system load:

  4. We hypothesize that the single-user Run #1 achieved a lower average throughput (1.2 Gbps) than the multi-user Runs #2-4 (~2 Gbps) in part due to constraints imposed by Globus.org's fair use logic
  5. We found that the average throughput experienced by users transferring 125 files did not appreciably degrade during the 100,000-file runs, suggesting that Globus.org's fair use logic limits the impact of large transfer jobs on other users:


  6. The following charts present a cross-site comparison of the average number of reported transfer attempts per file transferred in Runs #11 and #12. We note that a high transfer attempt average may be a good indicator of runtime troubles:


  7. An examination of the event logs for the first 49 OLCF transfers in Run #12 revealed an odd pattern. The chart below shows events on the y-axis and time on the x-axis; each panel represents a single file being transferred. Beginning with the first file in the lowest leftmost panel, the number of attempts increases by one for each successive transfer. Note that we have been unable to reproduce the same apparent behavior in subsequent experiments (during which we temporarily induced the same FILE_ACCESS error on a different endpoint).



  8. Code Defects
  9. Broken endpoint definitions were created when a user sent a malformed endpoint URL to the manage-endpoint command [Malformed endpoint url triggers traceback and creates dummy endpoint]
  10. A failure condition was triggered during Runs #3 and #4 in which some users did not receive transfer completion notification emails [Server too busy to send notification emails?]
  11. The event-logs output was edited prior to generating the "Event Summary" graphs for Runs #6-9 because commas imbedded in Globus.org-generated text were interfering with csv file interpretation. The "-d" option of the event-logs command was used in Runs #10-12 to override the delimiter default with "^". Either the delimiter default should be changed or removed from Globus.org-generated text [Remove commas from text returned by the event-logs command]
  12. When specifying the debug field on the event-logs commandline for Runs #10-11, newlines were imbedded in the output even when the -b option was specified on the command line [Imbedded newlines in event-logs debug output]
  13. Out-of-band md5sum data integrity checks for Run #11 uncovered a 0-length file at PADS that Globus.org had marked as being successfully transferred. Information exposed to the user about this transfer is collected here. Data integrity checks for all other files in the run succeeded [Data integrity error after 100k transfer job] [Checksum verification after the transfer is completed]
  14. The status command reported incorrect "bytes" values for Runs #11 and #12. 13 NERSC requests and 294 PADS requests from Run #11 have bad "bytes" values; data integrity checks succeeded for these files. For Run #12, incorrect values were reported for 2 bigred requests, 6 NERSC requests and 270 PADS requests; data integrity checks succeeded for these files as well [Transfer metadata corruption: status command reports bad 'bytes' values]
  15. Suboptimal request processing behavior was observed during an exploratory single user, 8-file, 8-site test: Globus.org executed multiple attempts against a problematic endpoint while ignoring requests involving other endpoints [Run with 0 attempts on good sites and 22 attempts on a bad site]

  16. Troubleshooting
  17. Suboptimal error detection is evident in Runs #6 and #11
  18. Suboptimal error diagnosis is evident in Runs #7-12
  19. Users are unable to assess the efficiency of Globus.org transfers because they cannot see the achievable bandwidth for their source/destination links
  20. Users are unable to determine the sources of bottlenecks (Globus.org, a network link, source endpoint, destination endpoint, etc.), making troubleshooting performance issues a time-consuming, cross-administrative domain sleuthing effort
  21. "Disk quota exceeded" events are classified as UNKNOWN, forcing the user to dig deeply into debug output to diagnose a problem that is both common and well-understood [Reduce the number of events classified as UNKNOWN]
  22. 65% of the transfer attempts in Run #11 resulted in events categorized by Globus.org as UNKNOWN [Reduce the number of events classified as UNKNOWN]
  23. BATCH_ERROR events are an artifact of the current Globus.org implementation; describing an event to the user by its effect on Globus.org internals does not facilitate problem resolution [Recast BATCH_ERROR events into something meaningful to users]
  24. Had the NERSC user in Run #11 not proactively queried progress and further investigated the problem causing transfer stalls (due to disk quota exceeded) the NERSC requests would likely have expired instead of succeeding [Notify users of interesting events]
  25. Had the NERSC and PADS users forgotten to renew their certificates during Runs #11-12, some of their requests would have expired [Notify users when their credentials expire]
  26. Out-of-band communication with a NERSC admin was required to find a workaround for the low throughput problem observed in Runs #5 and #6
  27. Out-of-band communication with a TeraGrid admin was required when the frost user's disk quota was exceeded during an exploratory testrun; user intervention was required to manually cancel transfers [Enable endpoint admins to configure local response to Globus.org retries] [Notify users of interesting events]
  28. Out-of-band communication with a TeraGrid admin was required to diagnose the host cert problem captured in the bigred event logs in Runs #8 and #9
  29. Manual tests and out-of-band communication with an OLCF admin were required to isolate and report the filesystem errors in Runs #8 and #9
  30. Out-of-band follow-up with OLCF and PADS admins will be required to troubleshoot the high number of transfer attempts involving their sites

Downloads

Links to the transfer metadata and event logs for each run are imbedded in descriptive text throughout the report; compilations of the data are also available for download here:
[
Transfer metadata] [Event logs] [Debug output] [Plots]


Run #12: Eleven Concurrent Users, Twelve Sites, One Hundred Thousand Files

Run #12 Overview

[Globus.org code version] [submission script] [activation commands]

Eleven users submitted requests to transfer a total of 100,000 200MB files from ALCF. Each user sent files to a different site. DOE, NSF and international destinations were involved: 49,000 files each to NERSC and PADS, 1,000 files to OLCF, and 125 files each to abe, ARCS, bigred, frost, lonestar, queenbee, ranger, steele. The charts below are based on a composite of the data in the Run #12 Details subsections; they show the time elapsed for every transfer in Run #12, as well as the number of transfer attempts for each file. All 100,000 transfers in the run succeeded (checksums verified out-of-band) after 14 hours 13 minutes for an average throughput of 3.1 Gbps. 421,977 transfer attempts (mean 4.2, median 1) were executed while trying to fulfill the eleven users' requests:

The chart below provides an overview of all events recorded by Globus.org during Run #12; this composite was built from the event data in the Run #12 Details subsections. The 100,000 successful transfers are represented by the purple bar. There were 421,977 transfer attempts (depicted as blue "start" events). A variety of additional events were recorded during the run: 32,610 abnormal ends (red bar), 7,739 "batch" errors (green bar), 379 CA errors, 24,414 endpoint errors (yellow bar), 114,408 file access errors (green bar), and 141,906 unknowns (magenta bar). More detailed debug logs are linked below.

Run #12 Details: NERSC

The NERSC transfer metadata for Run #12, extracted with the status command, can be found here. The two charts below are based on these data. The lefthand chart shows the time elapsed until each individual NERSC transfer request succeeded. All 49,000 files were successfully transferred after 12 hours 55 minutes for an average throughput of 1.7 Gbps. A total of 49,588 attempts (mean 1, median 1) was recorded during the run:

A drilldown inspection of the NERSC events for Run #12 was made possible by querying the event logs; the data are here. The logs reveal that all 49,000 NERSC requests succeeded (red bar) after a total of 49,588 attempts (green bar) and 394 "batch" error events. Debug output for the NERSC transfers can be found here.

Run #12 Details: OLCF

The OLCF transfer metadata for Run #12, extracted with the status command, can be found here. The two charts below are based on these data. The lefthand chart shows the time elapsed for each individual OLCF transfer during Run #12. All 1,000 files were successfully transferred after 2 hours 40 minutes for an average throughput of 166 Mbps. A total of 115,454 attempts (mean 115.5, median 138) was recorded during the run:

A drilldown inspection of the OLCF events for Run #12 was made possible by querying the event logs; the data are here. The logs reveal that all 1,000 OLCF requests succeeded (green bar) after a total of 115,454 attempts (blue bar). 114,408 file access errors (red bar) were recorded during the run. The debug output provides more detail, including filenames and failure times (in UTC).

Run #12 Details: PADS

The PADS transfer metadata for Run #12, extracted with the status command, can be found here. The two charts below are based on these data. The lefthand chart shows the time elapsed for each individual PADS transfer during Run #12. All 49,000 files were successfully transferred after 14 hours 12 minutes for an average throughput of 1.5 Gbps. A total of 255,881 attempts (mean 5.2, median 4) was recorded during the run:

A drilldown inspection of the PADS events for Run #12 was made possible by querying the event logs. An inspection of the event data reveals that all 49,000 PADS requests reportedly succeeded (blue bar), and a total of 255,881 attempts (purple bar) occurred in the course of fulfilling those requests. A variety of additional events were recorded during the run: 32,610 abnormal ends (red bar), 7,291 batch errors (magenta bar), 379 CA errors, 24,414 endpoint errors and 141,906 unknowns (cyan bar). Debug information about the PADS run is also available for inspection.

Run #12 Details: 125-File Sites

Metadata describing the 125-file runs (extracted on a per-user basis with the status command) can be found here: abe, ARCS, bigred, frost, lonestar, queenbee, ranger, steele. The following two charts are based on composites of these data. The lefthand chart shows the time elapsed for each individual file, grouped by destination/user. All 1,000 files were successfully transferred after 40 minutes for an average throughput of 663 Mbps. A total of 1,054 attempts (mean 1, median 1) was recorded during the run:

A drilldown inspection of the events for the 125-file destinations in Run #12 confirms that all 1,000 requests succeeded (red bar) after 1054 attempts (green bar) and 54 "batch" errors. Individual event logs for the sites are here: abe, ARCS, bigred, frost, lonestar, queenbee, ranger, steele; the composite log is here. Debug output for queenbee can be found here.

Run #11: Eleven Concurrent Users, Twelve Sites, One Hundred Thousand Files

Run #11 Overview

[Globus.org code version] [submission script] [activation commands]

Eleven users submitted requests to transfer a total of 100,000 200MB files from ALCF. Each user sent files to a different site: 49,000 files each to NERSC and PADS, 1,000 files to OLCF, and 125 files each to abe, ARCS, bigred, frost, lonestar, queenbee, ranger, steele. The charts below are composites of the data in Run #11 Details. The lefthand chart shows the time elapsed for each transfer in Run #12, ordered by transfer time. All transfers reportedly succeeded after 17 hours 15 minutes for an average throughput of 2.6 Gbps. 823,715 attempts (mean 8.2, median 1) were executed while trying to fulfill the eleven users' requests:

The chart below provides a composite view of all events recorded by Globus.org during Run #11. The successful transfers are represented by the blue bar. There were 823,715 transfer attempts (depicted as purple START events). A variety of additional events were recorded during the run: 64,620 abnormal ends (red bar), 8,106 "batch" errors (magenta bar), 114,542 file access errors (green bar), 1,065 CA errors (yellow bar), and 534,981 unknowns (cyan bar). Detailed event debug information is included in the site-specific Run #11 Details subsections below.

Run #11 Details: NERSC

The NERSC transfer metadata for Run #11 (extracted with the status command) can be found here. The two charts below are based on these data. The lefthand chart shows the time elapsed until each individual NERSC transfer request succeeded. All 49,000 files were successfully transferred after 17 hours 14 minutes for an average throughput of 1.3 Gbps. A total of 365,801 attempts (mean 7.5, median 1) was recorded during the run:

A drilldown inspection of the NERSC events for Run #11 was made possible by querying the event logs; the event-logs command ran for 8 minutes 22 seconds and produced this file. The data reveal that all 49,000 NERSC requests succeeded (cyan bar) amid a total of 365,801 attempts (blue bar). Three additional event types were recorded during the run: 632 batch errors (magenta bar), 2,931 file access errors (red bar), and 313,023 unknowns (yellow bar). Digging further in the debug output (events11-nersc.txt) shows that the majority of the unknown events are attributable to user error (disk quota exceeded).

Run #11 Details: OLCF

The OLCF transfer metadata for Run #11, extracted with the status command, can be found here. The two charts below are based on these data. The lefthand chart shows the time elapsed for each individual OLCF transfer during Run #11. All 1,000 files were successfully transferred after 3 hours 48 minutes for an average throughput of 116 Mbps. A total of 110,445 attempts (mean 110.4, median 105) was recorded during the run:

A drilldown inspection of the OLCF events for Run #11 was made possible by querying the event logs. The event-logs command ran for 1 minute 46 seconds, producing this file. The data reveal that all 1,000 OLCF requests succeeded (green bar) after a total of 110,445 attempts (blue bar). 109,445 file access errors (red bar) were recorded during the run. The debug output provides more detail about failure times and filenames (all times UTC): events11-olcf.txt.

Run #11 Details: PADS

The PADS transfer metadata for Run #11 (extracted with the status command) can be found here. The two charts below are based on these data. The lefthand chart shows the time elapsed for each individual PADS transfer during Run #11. The files were transferred after 16 hours 29 minutes for an average throughput of 1.3 Gbps. A total of 346,469 attempts (mean 7.1, median 5) was recorded during the run:

A drilldown inspection of the PADS events for Run #11 was made possible by querying the event logs. The event-logs command ran for 7 minutes 25 seconds and produced this file. An inspection of the event data reveals that all 49,000 PADS requests reportedly succeeded (blue bar), and a total of 346,469 attempts (purple bar) occurred in the course of fulfilling those requests. A variety of additional events were recorded during the run: 64,620 abnormal ends (red bar), 7,474 batch errors (magenta bar), 2,166 file access errors (green bar), 1,065 CA errors (yellow bar), and 221,958 unknowns (cyan bar). Debug information about the PADS run is available for inspection: events11-pads.txt.

Run #11 Details: 125-File Sites

Metadata describing the 125-file runs (extracted on a per-user basis with the status command) can be found here: abe, ARCS, bigred, frost, lonestar, queenbee, ranger, steele. The following two charts are based on a composite of these data. The lefthand chart shows the time elapsed for each individual file, grouped by destination/user. All 1,000 files were successfully transferred after 39 minutes for an average throughput of 678 Mbps. A total of 1,000 attempts (mean 1, median 1) was recorded during the run:

A drilldown inspection of the events for the 125-file destinations in Run #11 confirms that all 1,000 requests succeeded (red bar) on the first attempt (cyan bar). Composite event logs for the sites are here.

Run #10: Eleven Concurrent Users, Twelve Sites, One Thousand Three Hundred and Seventy Five Files

[Globus.org version]

After IU updated their host certs, eleven users each submitted a request to transfer 125 200MB files from ALCF. The transfer metadata for Run #10 (extracted with the status command) can be found here. The two charts below are based on these data. The lefthand chart shows the time elapsed for each transfer, grouped by destination/user. All transfers succeeded after 52 minutes for an average throughput of 709 Mbps. 8,321 attempts (mean 6.1, median 1) were executed while trying to fulfill the eleven users' requests:

A drilldown inspection of the ARCS events for Run #10 reveals that all 125 requests succeeded (red bar) after 152 attempts (green bar) and 27 "batch" events were recorded. Debug information for the ARCS run is available for inspection.

A drilldown inspection of the OLCF events for Run #10 reveals that all 125 requests succeeded (green bar) after 6,228 attempts (cyan bar) were executed. Other recorded events include 6,047 file access errors (red bar) and 56 "batch" events. Debug information for the OLCF run is available for inspection.

A drilldown inspection of the PADS events for Run #10 reveals that a total of 941 attempts (purple bar) were executed during the run and all 125 requests succeeded (cyan bar). In addition, 287 "abnormal end" and 529 "unknown state" events (green bar) were recorded. Debug information for the PADS run is available for inspection.

Run #9: Eleven Concurrent Users, Twelve Sites, One Thousand Three Hundred and Seventy Five Files

[Globus.org version]

For Run #9 eleven users each submitted a request to transfer 125 200MB files from ALCF within a 2-hour deadline. The transfer metadata for Run #9 (extracted with the status command) can be found here. The two charts below are based on these data. The lefthand chart shows the time elapsed for each transfer, grouped by destination/user. No IU requests succeeded prior to the deadline due to expired host certs at the destination. Excluding IU, all transfers succeeded after 51 minutes for an average throughput of 658 Mbps. 9,889 attempts (mean 7.2, median 1) were executed while trying to fulfill the eleven users' requests:

A drilldown inspection of the bigred events from Run #9 reveals a total of 6,934 attempts (blue bar) were executed. All 125 requests expired (red bar) and 20,037 "unknown state" events (green bar) were recorded. As in Run #8, examination of available debug information reveals multiple "530 Must perform GSSAPI authentication" error messages. Out-of-band communications with the IU admin reveal that their GridFTP host certificates have expired:

A drilldown inspection of the OLCF events for Run #9 reveals that a total of 1,676 attempts (magenta bar) were executed. 125 of the attempts (green bar) succeeded, indicating that all 125 requests were eventually fulfilled. During the run 2,238 file access errors (red bar), 22 timeouts (blue bar), and 80 "unknown state" events (yellow bar) were recorded. As in Run #8, examination of available debug information reveals multiple "Unable to open file" error messages. An out-of-band test suggests an intermittent problem with OLCF's Lustre filesystem:

A drilldown inspection of the ranger events for Run #9 reveals that a total of 279 attempts (blue bar) were executed during the run. All 125 requests eventually succeeded. During the run 156 timeouts were recorded:

Run #8: Eleven Concurrent Users, Twelve Sites, One Thousand Three Hundred and Seventy Five Files

[Globus.org version]

Eleven users each submitted a request to transfer 125 200MB files from ALCF within a 2-hour deadline. The transfer metadata for Run #8 (extracted with the status command) can be found here. The two charts below show the time elapsed for each transfer, grouped by destination/user. Fewer than 1% of the IU and Purdue requests succeeded prior to the deadline. Excluding IU and Purdue, all transfers succeeded after 48 minutes for an average throughput of 622 Mbps. 11,216 attempts (mean 8.2, median 1) were made in pursuit of fulfilling the eleven users' requests:

A drilldown inspection of the bigred events for Run #8 reveals that a total of 6,523 attempts (blue bar) were executed. All 125 requests expired (red bar) and 18,674 "unknown state" events (green bar) were recorded. Examination of available debug information reveals multiple "530 Must perform GSSAPI authentication." error messages.

It took 5 minutes 2 seconds to extract the bigred event information using the event-logs command:

A drilldown inspection of the OLCF events for Run #8 reveals that a total of 1,576 attempts (purple bar) were executed. 125 of the attempts (cyan bar) succeeded, indicating that all 125 requests were eventually fulfilled. During the run 2,285 file access errors (red bar) and 86 "unknown state" events (green bar) were recorded. Examination of available debug information reveals multiple instances of the error message "500-globus_xio: Unable to open file //lustre/widow1/proj/csc024/childers/ddest/multi/cdc3/100Kfiles200M/2-1Kfiles200M/destination-filename 500-globus_xio: System error in open: Permission denied".

It took 4 minutes 12 seconds to extract the OLCF event information using the event-logs command:

A drilldown inspection of the steele events for Run #8 reveals that a total of 2,117 attempts (blue bar) were executed. 123 requests expired (red bar) and 2 requests succeeded. During the run 123 "batch errors" (magenta bar), 486 timeouts (cyan bar), and 2,868 "unknown state" events (yellow bar) were recorded. Examination of available debug information reveals 2,831 "error: an end-of-file was reached globus_xio: An end of file occurred" messages, and 37 "error: globus_ftp_client: the operation was aborted" messages.

It took 4 minutes 32 seconds to extract the steele event information using the event-logs command:

Run #7: Ten Concurrent Users, Eleven Sites (sans NERSC DTN02), One Thousand Files

[Globus.org version]

Ten users each submitted a request to transfer 100 200MB files from ALCF. All destinations were identical to Runs #5 and #6, with the exception that the NERSC node DTN02 was excluded. The two charts below show the time elapsed and number of attempts for each transfer, grouped by destination/user. All files were successfully transferred after 31 minutes for an average throughput of 865 Mbps. A total of 1,214 attempts (mean 1.2, median 1) was recorded during the run:

A drilldown inspection of the steele events returned by the event-logs command reveals that a total of 341 attempts (purple bar) were executed in fulfilling the user's request to transfer 100 files from ALCF to steele. 100 of the attempts succeeded. During the run 191 timeouts and 81 "unknown state" events were recorded. Examination of the debug information returned by event-logs reveals 81 instances of a GridFTP-like error message: "error: an end-of-file was reached globus_xio: An end of file occurred PY_ISSUE_STDERR".

Run #6: Ten Concurrent Users, Eleven Sites, One Thousand Files

[Globus.org version]

After removing two broken endpoint definitions inadvertently created during setup of the fifth run (see Finding #7) the requests were resubmitted. The two charts show the time elapsed and number of attempts for each transfer in this run, grouped by destination/user. All files were successfully transferred after 66 minutes for an average throughput of 400 Mbps. A total of 1,000 attempts (mean 1, median 1) was recorded. From Globus.org's perspective this was a clean run:

It took 36 minutes 39 seconds to extract the event information from Globus.org for Run #6; 1,000 start events and 1,000 success events were returned:

Run #5: Ten Concurrent Users, Eleven Sites, One Thousand Files

[Globus.org version]

Ten users each submitted a request to transfer 100 200MB files from ALCF to various destinations. The two charts below show the time elapsed and number of attempts for each transfer, grouped by destination/user. All files were successfully transferred after 86 minutes for an average throughput of 310 Mbps. A total of 1,836 attempts (mean 1.8, median 1) was recorded during the run:

Run #4: Ten Concurrent Users, Two Sites, One Hundred Thousand Files

[Globus.org version]

Ten users each submitted a request to transfer 10,000 200MB files from ALCF to PADS; Globus.org took 58 minutes to store the 100,000 requests in its database. Requests were submitted sequentially (10k red user requests submitted, immediately followed by 10k orange user requests, immediately followed by yellow, lime green, mint green, cyan, blue, purple, fuschia, pink.) The charts below show the time elapsed for each individual transfer in the run: Transfer Time represents the difference between the time the transfer succeeded and the time the request was initially recorded in the database. The righthand chart shows per-user breakdowns. All files were successfully transferred after 21 hours and 8 minutes for an average throughput of 2.1 Gbps. A code defect was uncovered during the run: Globus.org failed to send email notifications to users red, yellow, mint green, cyan, and blue upon completion of their transfer jobs.

A total of 241,537 attempts (mean 2.4, median 2) was recorded during the run:

Run #3: Six Concurrent Users, Two Sites, Sixty Thousand Files

[Globus.org version]

Six users each submitted a request to transfer 10,000 200MB files from ALCF to PADS; Globus.org took 33 minutes to store the 60,000 requests in its database. Requests were submitted sequentially (10k red user requests submitted, immediately followed by 10k yellow user requests, immediately followed by green, cyan, blue, fuschia.) The charts below show all 60,000 requests ordered by transfer time, as well as per-user breakdowns. All files were successfully transferred after 13 hours and 13 minutes for an average throughput of 2.0 Gbps. The notifiation defect was triggered: Globus.org failed to send email notifications to users cyan, blue, and fuschia upon completion of their transfer jobs.

A total of 146,411 attempts (mean 2.4, median 2) was recorded during the run:

Run #2: Three Concurrent Users, Two Sites, Thirty Thousand Files

[Globus.org version]

Three users each submitted a request to transfer 10,000 200MB files from ALCF to PADS; Globus.org took 11 minutes 20 seconds to store the 30,000 requests in its database. Requests were submitted sequentially (10k red user requests were submitted, immediately followed by 10k yellow user requests, immediately followed by green.) The charts below show all 30,000 requests ordered by transfer time, and with per-user breakdowns. All 30,000 files were successfully transferred after 6 hours and 41 minutes for an average throughput of 2.0 Gbps.

A total of 48,158 attempts (mean 1.6, median 1.0) was recorded during the run:

Run #1: Single User, Two Sites, Ten Thousand Files

[Globus.org version]

A single user submitted a request to transfer 10,000 200MB files from ALCF to PADS; Globus.org took 1 minute 35 seconds to store the 10,000 requests in its database. The charts below show the time elapsed for each individual transfer and the number of transfer attempts during the run. All files were successfully transferred after 3 hours and 39 minutes for an average throughput of 1.2 Gbps. A total of 35,864 attempts (mean 3.6, median 3) was recorded during the run: