



Links to the transfer metadata and event logs for each run are imbedded in descriptive text throughout the report; compilations of the data are also available for download here:
[Transfer metadata] [Event logs] [Debug output] [Plots]
[Globus.org code version] [submission script] [activation commands]
Eleven users submitted requests to transfer a total of 100,000 200MB files from ALCF. Each user sent files to a different site. DOE, NSF and international destinations were involved: 49,000 files each to NERSC and PADS, 1,000 files to OLCF, and 125 files each to abe, ARCS, bigred, frost, lonestar, queenbee, ranger, steele. The charts below are based on a composite of the data in the Run #12 Details subsections; they show the time elapsed for every transfer in Run #12, as well as the number of transfer attempts for each file. All 100,000 transfers in the run succeeded (checksums verified out-of-band) after 14 hours 13 minutes for an average throughput of 3.1 Gbps. 421,977 transfer attempts (mean 4.2, median 1) were executed while trying to fulfill the eleven users' requests:
The chart below provides an overview of all events recorded by Globus.org during Run #12; this composite was built from the event data in the Run #12 Details subsections. The 100,000 successful transfers are represented by the purple bar. There were 421,977 transfer attempts (depicted as blue "start" events). A variety of additional events were recorded during the run: 32,610 abnormal ends (red bar), 7,739 "batch" errors (green bar), 379 CA errors, 24,414 endpoint errors (yellow bar), 114,408 file access errors (green bar), and 141,906 unknowns (magenta bar). More detailed debug logs are linked below.
The NERSC transfer metadata for Run #12, extracted with the status command, can be found here. The two charts below are based on these data. The lefthand chart shows the time elapsed until each individual NERSC transfer request succeeded. All 49,000 files were successfully transferred after 12 hours 55 minutes for an average throughput of 1.7 Gbps. A total of 49,588 attempts (mean 1, median 1) was recorded during the run:
A drilldown inspection of the NERSC events for Run #12 was made possible by querying the event logs; the data are here. The logs reveal that all 49,000 NERSC requests succeeded (red bar) after a total of 49,588 attempts (green bar) and 394 "batch" error events. Debug output for the NERSC transfers can be found here.
The OLCF transfer metadata for Run #12, extracted with the status command, can be found here. The two charts below are based on these data. The lefthand chart shows the time elapsed for each individual OLCF transfer during Run #12. All 1,000 files were successfully transferred after 2 hours 40 minutes for an average throughput of 166 Mbps. A total of 115,454 attempts (mean 115.5, median 138) was recorded during the run:
A drilldown inspection of the OLCF events for Run #12 was made possible by querying the event logs; the data are here. The logs reveal that all 1,000 OLCF requests succeeded (green bar) after a total of 115,454 attempts (blue bar). 114,408 file access errors (red bar) were recorded during the run. The debug output provides more detail, including filenames and failure times (in UTC).
The PADS transfer metadata for Run #12, extracted with the status command, can be found here. The two charts below are based on these data. The lefthand chart shows the time elapsed for each individual PADS transfer during Run #12. All 49,000 files were successfully transferred after 14 hours 12 minutes for an average throughput of 1.5 Gbps. A total of 255,881 attempts (mean 5.2, median 4) was recorded during the run:
A drilldown inspection of the PADS events for Run #12 was made possible by querying the event logs. An inspection of the event data reveals that all 49,000 PADS requests reportedly succeeded (blue bar), and a total of 255,881 attempts (purple bar) occurred in the course of fulfilling those requests. A variety of additional events were recorded during the run: 32,610 abnormal ends (red bar), 7,291 batch errors (magenta bar), 379 CA errors, 24,414 endpoint errors and 141,906 unknowns (cyan bar). Debug information about the PADS run is also available for inspection.
Metadata describing the 125-file runs (extracted on a per-user basis with the status command) can be found here: abe, ARCS, bigred, frost, lonestar, queenbee, ranger, steele. The following two charts are based on composites of these data. The lefthand chart shows the time elapsed for each individual file, grouped by destination/user. All 1,000 files were successfully transferred after 40 minutes for an average throughput of 663 Mbps. A total of 1,054 attempts (mean 1, median 1) was recorded during the run:
A drilldown inspection of the events for the 125-file destinations in Run #12 confirms that all 1,000 requests succeeded (red bar) after 1054 attempts (green bar) and 54 "batch" errors. Individual event logs for the sites are here: abe, ARCS, bigred, frost, lonestar, queenbee, ranger, steele; the composite log is here. Debug output for queenbee can be found here.
[Globus.org code version] [submission script] [activation commands]
Eleven users submitted requests to transfer a total of 100,000 200MB files from ALCF. Each user sent files to a different site: 49,000 files each to NERSC and PADS, 1,000 files to OLCF, and 125 files each to abe, ARCS, bigred, frost, lonestar, queenbee, ranger, steele. The charts below are composites of the data in Run #11 Details. The lefthand chart shows the time elapsed for each transfer in Run #12, ordered by transfer time. All transfers reportedly succeeded after 17 hours 15 minutes for an average throughput of 2.6 Gbps. 823,715 attempts (mean 8.2, median 1) were executed while trying to fulfill the eleven users' requests:
The chart below provides a composite view of all events recorded by Globus.org during Run #11. The successful transfers are represented by the blue bar. There were 823,715 transfer attempts (depicted as purple START events). A variety of additional events were recorded during the run: 64,620 abnormal ends (red bar), 8,106 "batch" errors (magenta bar), 114,542 file access errors (green bar), 1,065 CA errors (yellow bar), and 534,981 unknowns (cyan bar). Detailed event debug information is included in the site-specific Run #11 Details subsections below.
The NERSC transfer metadata for Run #11 (extracted with the status command) can be found here. The two charts below are based on these data. The lefthand chart shows the time elapsed until each individual NERSC transfer request succeeded. All 49,000 files were successfully transferred after 17 hours 14 minutes for an average throughput of 1.3 Gbps. A total of 365,801 attempts (mean 7.5, median 1) was recorded during the run:
A drilldown inspection of the NERSC events for Run #11 was made possible by querying the event logs; the event-logs command ran for 8 minutes 22 seconds and produced this file. The data reveal that all 49,000 NERSC requests succeeded (cyan bar) amid a total of 365,801 attempts (blue bar). Three additional event types were recorded during the run: 632 batch errors (magenta bar), 2,931 file access errors (red bar), and 313,023 unknowns (yellow bar). Digging further in the debug output (events11-nersc.txt) shows that the majority of the unknown events are attributable to user error (disk quota exceeded).
The OLCF transfer metadata for Run #11, extracted with the status command, can be found here. The two charts below are based on these data. The lefthand chart shows the time elapsed for each individual OLCF transfer during Run #11. All 1,000 files were successfully transferred after 3 hours 48 minutes for an average throughput of 116 Mbps. A total of 110,445 attempts (mean 110.4, median 105) was recorded during the run:
A drilldown inspection of the OLCF events for Run #11 was made possible by querying the event logs. The event-logs command ran for 1 minute 46 seconds, producing this file. The data reveal that all 1,000 OLCF requests succeeded (green bar) after a total of 110,445 attempts (blue bar). 109,445 file access errors (red bar) were recorded during the run. The debug output provides more detail about failure times and filenames (all times UTC): events11-olcf.txt.
The PADS transfer metadata for Run #11 (extracted with the status command) can be found here. The two charts below are based on these data. The lefthand chart shows the time elapsed for each individual PADS transfer during Run #11. The files were transferred after 16 hours 29 minutes for an average throughput of 1.3 Gbps. A total of 346,469 attempts (mean 7.1, median 5) was recorded during the run:
A drilldown inspection of the PADS events for Run #11 was made possible by querying the event logs. The event-logs command ran for 7 minutes 25 seconds and produced this file. An inspection of the event data reveals that all 49,000 PADS requests reportedly succeeded (blue bar), and a total of 346,469 attempts (purple bar) occurred in the course of fulfilling those requests. A variety of additional events were recorded during the run: 64,620 abnormal ends (red bar), 7,474 batch errors (magenta bar), 2,166 file access errors (green bar), 1,065 CA errors (yellow bar), and 221,958 unknowns (cyan bar). Debug information about the PADS run is available for inspection: events11-pads.txt.
Metadata describing the 125-file runs (extracted on a per-user basis with the status command) can be found here: abe, ARCS, bigred, frost, lonestar, queenbee, ranger, steele. The following two charts are based on a composite of these data. The lefthand chart shows the time elapsed for each individual file, grouped by destination/user. All 1,000 files were successfully transferred after 39 minutes for an average throughput of 678 Mbps. A total of 1,000 attempts (mean 1, median 1) was recorded during the run:
A drilldown inspection of the events for the 125-file destinations in Run #11 confirms that all 1,000 requests succeeded (red bar) on the first attempt (cyan bar). Composite event logs for the sites are here.
After IU updated their host certs, eleven users each submitted a request to transfer 125 200MB files from ALCF. The transfer metadata for Run #10 (extracted with the status command) can be found here. The two charts below are based on these data. The lefthand chart shows the time elapsed for each transfer, grouped by destination/user. All transfers succeeded after 52 minutes for an average throughput of 709 Mbps. 8,321 attempts (mean 6.1, median 1) were executed while trying to fulfill the eleven users' requests:
A drilldown inspection of the ARCS events for Run #10 reveals that all 125 requests succeeded (red bar) after 152 attempts (green bar) and 27 "batch" events were recorded. Debug information for the ARCS run is available for inspection.
A drilldown inspection of the OLCF events for Run #10 reveals that all 125 requests succeeded (green bar) after 6,228 attempts (cyan bar) were executed. Other recorded events include 6,047 file access errors (red bar) and 56 "batch" events. Debug information for the OLCF run is available for inspection.
A drilldown inspection of the PADS events for Run #10 reveals that a total of 941 attempts (purple bar) were executed during the run and all 125 requests succeeded (cyan bar). In addition, 287 "abnormal end" and 529 "unknown state" events (green bar) were recorded. Debug information for the PADS run is available for inspection.
For Run #9 eleven users each submitted a request to transfer 125 200MB files from ALCF within a 2-hour deadline. The transfer metadata for Run #9 (extracted with the status command) can be found here. The two charts below are based on these data. The lefthand chart shows the time elapsed for each transfer, grouped by destination/user. No IU requests succeeded prior to the deadline due to expired host certs at the destination. Excluding IU, all transfers succeeded after 51 minutes for an average throughput of 658 Mbps. 9,889 attempts (mean 7.2, median 1) were executed while trying to fulfill the eleven users' requests:
A drilldown inspection of the bigred events from Run #9 reveals a total of 6,934 attempts (blue bar) were executed. All 125 requests expired (red bar) and 20,037 "unknown state" events (green bar) were recorded. As in Run #8, examination of available debug information reveals multiple "530 Must perform GSSAPI authentication" error messages. Out-of-band communications with the IU admin reveal that their GridFTP host certificates have expired:
A drilldown inspection of the OLCF events for Run #9 reveals that a total of 1,676 attempts (magenta bar) were executed. 125 of the attempts (green bar) succeeded, indicating that all 125 requests were eventually fulfilled. During the run 2,238 file access errors (red bar), 22 timeouts (blue bar), and 80 "unknown state" events (yellow bar) were recorded. As in Run #8, examination of available debug information reveals multiple "Unable to open file" error messages. An out-of-band test suggests an intermittent problem with OLCF's Lustre filesystem:
A drilldown inspection of the ranger events for Run #9 reveals that a total of 279 attempts (blue bar) were executed during the run. All 125 requests eventually succeeded. During the run 156 timeouts were recorded:
Eleven users each submitted a request to transfer 125 200MB files from ALCF within a 2-hour deadline. The transfer metadata for Run #8 (extracted with the status command) can be found here. The two charts below show the time elapsed for each transfer, grouped by destination/user. Fewer than 1% of the IU and Purdue requests succeeded prior to the deadline. Excluding IU and Purdue, all transfers succeeded after 48 minutes for an average throughput of 622 Mbps. 11,216 attempts (mean 8.2, median 1) were made in pursuit of fulfilling the eleven users' requests:
A drilldown inspection of the bigred events for Run #8 reveals that a total of 6,523 attempts (blue bar) were executed. All 125 requests expired (red bar) and 18,674 "unknown state" events (green bar) were recorded. Examination of available debug information reveals multiple "530 Must perform GSSAPI authentication." error messages.
It took 5 minutes 2 seconds to extract the bigred event information using the event-logs command:
A drilldown inspection of the OLCF events for Run #8 reveals that a total of 1,576 attempts (purple bar) were executed. 125 of the attempts (cyan bar) succeeded, indicating that all 125 requests were eventually fulfilled. During the run 2,285 file access errors (red bar) and 86 "unknown state" events (green bar) were recorded. Examination of available debug information reveals multiple instances of the error message "500-globus_xio: Unable to open file //lustre/widow1/proj/csc024/childers/ddest/multi/cdc3/100Kfiles200M/2-1Kfiles200M/destination-filename 500-globus_xio: System error in open: Permission denied".
It took 4 minutes 12 seconds to extract the OLCF event information using the event-logs command:
A drilldown inspection of the steele events for Run #8 reveals that a total of 2,117 attempts (blue bar) were executed. 123 requests expired (red bar) and 2 requests succeeded. During the run 123 "batch errors" (magenta bar), 486 timeouts (cyan bar), and 2,868 "unknown state" events (yellow bar) were recorded. Examination of available debug information reveals 2,831 "error: an end-of-file was reached globus_xio: An end of file occurred" messages, and 37 "error: globus_ftp_client: the operation was aborted" messages.
It took 4 minutes 32 seconds to extract the steele event information using the event-logs command:
Ten users each submitted a request to transfer 100 200MB files from ALCF. All destinations were identical to Runs #5 and #6, with the exception that the NERSC node DTN02 was excluded. The two charts below show the time elapsed and number of attempts for each transfer, grouped by destination/user. All files were successfully transferred after 31 minutes for an average throughput of 865 Mbps. A total of 1,214 attempts (mean 1.2, median 1) was recorded during the run:
A drilldown inspection of the steele events returned by the event-logs command reveals that a total of 341 attempts (purple bar) were executed in fulfilling the user's request to transfer 100 files from ALCF to steele. 100 of the attempts succeeded. During the run 191 timeouts and 81 "unknown state" events were recorded. Examination of the debug information returned by event-logs reveals 81 instances of a GridFTP-like error message: "error: an end-of-file was reached globus_xio: An end of file occurred PY_ISSUE_STDERR".
After removing two broken endpoint definitions inadvertently created during setup of the fifth run (see Finding #7) the requests were resubmitted. The two charts show the time elapsed and number of attempts for each transfer in this run, grouped by destination/user. All files were successfully transferred after 66 minutes for an average throughput of 400 Mbps. A total of 1,000 attempts (mean 1, median 1) was recorded. From Globus.org's perspective this was a clean run:
It took 36 minutes 39 seconds to extract the event information from Globus.org for Run #6; 1,000 start events and 1,000 success events were returned:
Ten users each submitted a request to transfer 100 200MB files from ALCF to various destinations. The two charts below show the time elapsed and number of attempts for each transfer, grouped by destination/user. All files were successfully transferred after 86 minutes for an average throughput of 310 Mbps. A total of 1,836 attempts (mean 1.8, median 1) was recorded during the run:
Ten users each submitted a request to transfer 10,000 200MB files from ALCF to PADS; Globus.org took 58 minutes to store the 100,000 requests in its database. Requests were submitted sequentially (10k red user requests submitted, immediately followed by 10k orange user requests, immediately followed by yellow, lime green, mint green, cyan, blue, purple, fuschia, pink.) The charts below show the time elapsed for each individual transfer in the run: Transfer Time represents the difference between the time the transfer succeeded and the time the request was initially recorded in the database. The righthand chart shows per-user breakdowns. All files were successfully transferred after 21 hours and 8 minutes for an average throughput of 2.1 Gbps. A code defect was uncovered during the run: Globus.org failed to send email notifications to users red, yellow, mint green, cyan, and blue upon completion of their transfer jobs.
A total of 241,537 attempts (mean 2.4, median 2) was recorded during the run:
Six users each submitted a request to transfer 10,000 200MB files from ALCF to PADS; Globus.org took 33 minutes to store the 60,000 requests in its database. Requests were submitted sequentially (10k red user requests submitted, immediately followed by 10k yellow user requests, immediately followed by green, cyan, blue, fuschia.) The charts below show all 60,000 requests ordered by transfer time, as well as per-user breakdowns. All files were successfully transferred after 13 hours and 13 minutes for an average throughput of 2.0 Gbps. The notifiation defect was triggered: Globus.org failed to send email notifications to users cyan, blue, and fuschia upon completion of their transfer jobs.
A total of 146,411 attempts (mean 2.4, median 2) was recorded during the run:
Three users each submitted a request to transfer 10,000 200MB files from ALCF to PADS; Globus.org took 11 minutes 20 seconds to store the 30,000 requests in its database. Requests were submitted sequentially (10k red user requests were submitted, immediately followed by 10k yellow user requests, immediately followed by green.) The charts below show all 30,000 requests ordered by transfer time, and with per-user breakdowns. All 30,000 files were successfully transferred after 6 hours and 41 minutes for an average throughput of 2.0 Gbps.
A total of 48,158 attempts (mean 1.6, median 1.0) was recorded during the run:
A single user submitted a request to transfer 10,000 200MB files from ALCF to PADS; Globus.org took 1 minute 35 seconds to store the 10,000 requests in its database. The charts below show the time elapsed for each individual transfer and the number of transfer attempts during the run. All files were successfully transferred after 3 hours and 39 minutes for an average throughput of 1.2 Gbps. A total of 35,864 attempts (mean 3.6, median 3) was recorded during the run: