Usage
Darshan job summary tool
As a starting point, users can use PyDarshan to generate detailed summary HTML reports of I/O activity for a given Darshan log. An example job summary report can be viewed HERE.
Usage of this job summary tool is described below.
usage: darshan summary [-h] [--output OUTPUT] [--enable_dxt_heatmap]
[--exclude_names EXCLUDE_NAMES] [--include_names INCLUDE_NAMES]
log_path
Generates a Darshan Summary Report
positional arguments:
log_path Specify path to darshan log.
optional arguments:
-h, --help show this help message and exit
--output OUTPUT Specify output filename.
--enable_dxt_heatmap Enable DXT-based versions of I/O activity heatmaps.
--exclude_names EXCLUDE_NAMES
regex patterns for file record names to exclude in summary report
--include_names INCLUDE_NAMES
regex patterns for file record names to include in summary report
For example, the following command would generate an HTML job summary report for a Darshan log file named example.darshan.
$ python -m darshan summary example.darshan
If --output
option is not specified, the output HTML report will be based
on the input log file name (i.e., the above command would generate an HTML
report named example_report.html).
Other Darshan CLI tools
There are also command line tools available for quickly printing terminal output
describing general I/O statistics of one or more input Darhan logs.
The job_stats
tool is used to summarize key job-level I/O parameters for each
of a given set of Darshan logs, ordering the jobs according to some I/O metric.
Alternatively, the file_stats
is used to summarize key file-level I/O
parameters for each file accessed across a set of Darshan logs, with the files
ordered according to some I/O metric.
Usage of the job_stats
tool is described below.
usage: darshan job_stats [-h] [--log_paths_file LOG_PATHS_FILE] [--module [{POSIX,MPI-IO,STDIO}]]
[--order_by [{perf_by_slowest,time_by_slowest,total_bytes,total_files}]] [--limit [LIMIT]]
[--csv] [--exclude_names EXCLUDE_NAMES] [--include_names INCLUDE_NAMES]
[log_paths [log_paths ...]]
Print statistics describing key metadata and I/O performance metrics for a given list of jobs.
positional arguments:
log_paths specify the paths to Darshan log files
optional arguments:
-h, --help show this help message and exit
--log_paths_file LOG_PATHS_FILE
specify the path to a manifest file listing Darshan log files
--module [{POSIX,MPI-IO,STDIO}], -m [{POSIX,MPI-IO,STDIO}]
specify the Darshan module to generate job stats for (default: POSIX)
--order_by [{perf_by_slowest,time_by_slowest,total_bytes,total_files}], -o [{perf_by_slowest,time_by_slowest,total_bytes,total_files}]
specify the I/O metric to order jobs by (default: total_bytes)
--limit [LIMIT], -l [LIMIT]
limit output to the top LIMIT number of jobs according to selected metric
--csv, -c output job stats in CSV format
--exclude_names EXCLUDE_NAMES, -e EXCLUDE_NAMES
regex patterns for file record names to exclude in stats
--include_names INCLUDE_NAMES, -i INCLUDE_NAMES
regex patterns for file record names to include in stats
Options allow for users to calculate stats for specific modules, to use a number of different I/O statistics to order jobs, to limit output to the top N jobs, to print in CSV format (rather than default Rich printing), and to filter file names within jobs. Note that users can either provide the list of Darshan logs directly on the command line or use a manifest file in cases where many logs are to be analyzed at once.
Usage of the file_stats
tool is described below.
usage: darshan file_stats [-h] [--log_paths_file LOG_PATHS_FILE] [--module [{POSIX,MPI-IO,STDIO}]]
[--order_by [{bytes_read,bytes_written,reads,writes,total_jobs}]] [--limit [LIMIT]] [--csv]
[--exclude_names EXCLUDE_NAMES] [--include_names INCLUDE_NAMES]
[log_paths [log_paths ...]]
Print statistics describing key metadata and I/O performance metrics for files accessed by a given list of jobs.
positional arguments:
log_paths specify the paths to Darshan log files
optional arguments:
-h, --help show this help message and exit
--log_paths_file LOG_PATHS_FILE
specify the path to a manifest file listing Darshan log files
--module [{POSIX,MPI-IO,STDIO}], -m [{POSIX,MPI-IO,STDIO}]
specify the Darshan module to generate file stats for (default: POSIX)
--order_by [{bytes_read,bytes_written,reads,writes,total_jobs}], -o [{bytes_read,bytes_written,reads,writes,total_jobs}]
specify the I/O metric to order files by (default: bytes_read)
--limit [LIMIT], -l [LIMIT]
limit output to the top LIMIT number of jobs according to selected metric
--csv, -c output file stats in CSV format
--exclude_names EXCLUDE_NAMES, -e EXCLUDE_NAMES
regex patterns for file record names to exclude in stats
--include_names INCLUDE_NAMES, -i INCLUDE_NAMES
regex patterns for file record names to include in stats
The options for the file_stats
are largely identical to that of file_stats
other
than slightly different I/O metrics that can be used to sort output.
Darshan Report interface
Users can use the Darshan Report interface to help develop custom log analysis tools. The example below demonstrates how to use this interface to open a Darshan log file, read in log metadata and instrumentation records, and export record data to a pandas DataFrame.
import darshan
# open a Darshan log file and read all data stored in it
with darshan.DarshanReport(filename, read_all=True) as report:
# print the metadata dict for this log
print("metadata: ", report.metadata)
# print job runtime and nprocs
print("run_time: ", report.metadata['job']['run_time'])
print("nprocs: ", report.metadata['job']['nprocs'])
# print modules contained in the report
print("modules: ", list(report.modules.keys()))
# export POSIX module records to DataFrame and print
posix_df = report.records['POSIX'].to_df()
print("POSIX df: ", posix_df)
Darshan CFFI backend interface
Generally, it is more convenient to access a Darshan log from Python using the Report interface, which also caches already fetched information such as log records on a per-module basis. If this seems like an unwanted overhead, the CFFI interface can be used directly to gain fine-grained control over what log data is being loaded.
The example below demonstrates some usage of the CFFI backend for opening a log file and accessing different types of log data:
import darshan.backend.cffi_backend as darshanll
log = darshanll.log_open("example.darshan")
# Access various job information
darshanll.log_get_job(log)
# Example Return:
# {'jobid': 4478544,
# 'uid': 69615,
# 'start_time': 1490000867,
# 'end_time': 1490000983,
# 'metadata': {'lib_ver': '3.1.3', 'h': 'romio_no_indep_rw=true;cb_nodes=4'}}
# Access available modules and modules
darshanll.log_get_modules(log)
# Example Return:
# {'POSIX': {'len': 186, 'ver': 3, 'idx': 1},
# 'MPI-IO': {'len': 154, 'ver': 2, 'idx': 2},
# 'LUSTRE': {'len': 87, 'ver': 1, 'idx': 6},
# 'STDIO': {'len': 3234, 'ver': 1, 'idx': 7}}
# Access different record types as numpy arrays, with integer and float counters separated
# Example Return: {'counters': array([...], dtype=uint64), 'fcounters': array([...])}
posix_record = darshanll.log_get_record(log, "POSIX")
mpiio_record = darshanll.log_get_record(log, "MPI-IO")
stdio_record = darshanll.log_get_record(log, "STDIO")
# ...
darshanll.log_close(log)