Usage

Darshan job summary tool

As a starting point, users can use PyDarshan to generate detailed summary HTML reports of I/O activity for a given Darshan log. Usage of this job summary tool is described below.

usage: darshan summary [-h] [--output OUTPUT] [--enable_dxt_heatmap] log_path

Generates a Darshan Summary Report

positional arguments:
  log_path              Specify path to darshan log.

optional arguments:
  -h, --help            show this help message and exit
  --output OUTPUT       Specify output filename.
  --enable_dxt_heatmap  Enable DXT-based versions of I/O activity heatmaps.

For example, the following command would generate an HTML job summary report for a Darshan log file named example.darshan.

$ python -m darshan summary example.darshan

If --output option is not specified, the output HTML report will be based on the input log file name (i.e., the above command would generate an HTML report named example_report.html).

Darshan Report interface

Users can use the Darshan Report interface to help develop custom log analysis tools. The example below demonstrates how to use this interface to open a Darshan log file, read in log metadata and instrumentation records, and export record data to a pandas DataFrame.

import darshan

# open a Darshan log file and read all data stored in it
with darshan.DarshanReport(filename, read_all=True) as report:

    # print the metadata dict for this log
    print("metadata: ", report.metadata)
    # print job runtime and nprocs
    print("run_time: ", report.metadata['job']['run_time'])
    print("nprocs: ", report.metadata['job']['nprocs'])

    # print modules contained in the report
    print("modules: ", list(report.modules.keys()))

    # export POSIX module records to DataFrame and print
    posix_df = report.records['POSIX'].to_df()
    print("POSIX df: ", posix_df)

Darshan CFFI backend interface

Generally, it is more convenient to access a Darshan log from Python using the Report interface, which also caches already fetched information such as log records on a per-module basis. If this seems like an unwanted overhead, the CFFI interface can be used directly to gain fine-grained control over what log data is being loaded.

The example below demonstrates some usage of the CFFI backend for opening a log file and accessing different types of log data:

import darshan.backend.cffi_backend as darshanll

log = darshanll.log_open("example.darshan")

# Access various job information
darshanll.log_get_job(log)
# Example Return:
# {'jobid': 4478544,
# 'uid': 69615,
# 'start_time': 1490000867,
# 'end_time': 1490000983,
# 'metadata': {'lib_ver': '3.1.3', 'h': 'romio_no_indep_rw=true;cb_nodes=4'}}


# Access available modules and modules
darshanll.log_get_modules(log)
# Example Return:
# {'POSIX': {'len': 186, 'ver': 3, 'idx': 1},
#  'MPI-IO': {'len': 154, 'ver': 2, 'idx': 2},
#  'LUSTRE': {'len': 87, 'ver': 1, 'idx': 6},
#  'STDIO': {'len': 3234, 'ver': 1, 'idx': 7}}

# Access different record types as numpy arrays, with integer and float counters separated
# Example Return: {'counters': array([...], dtype=uint64), 'fcounters': array([...])}
posix_record = darshanll.log_get_record(log, "POSIX")
mpiio_record = darshanll.log_get_record(log, "MPI-IO")
stdio_record = darshanll.log_get_record(log, "STDIO")
# ...

darshanll.log_close(log)