Darshan 3.2.1 bugfix release available

Due to a reported bug in last week’s 3.2.0 release of Darshan, we have decided to quickly release Darshan 3.2.1 for our users. It is available for download here.

This bugfix is somewhat critical, particularly in production environments, as it is can lead to corrupted Darshan log file data and, potentially, application crashes (though we have not triggered any crashes in our testing). The issue was originally detected by noticing bogus values in the COMMON_ACCESS counters reported by the POSIX, MPIIO, and H5 modules.

In any case, we highly recommend any 3.2.0 users upgrade to this version to avoid any potential for crashes or corrupted Darshan log file data.

Please report any additional questions, issues, or concerns using the Darshan-users mailing list, or by opening an issue on the Darshan GitLab page.

Darshan version 3.2.0 is now officially available

Darshan 3.2.0 is now available for download here.

This release contains a number of new features, bug fixes, and other changes to Darshan. Some of the more notable changes that may be of interest to users:

  • Added detailed instrumentation of HDF5 file (H5F) and dataset (H5D) interfaces.
    • Must be explicitly enabled by passing “–enable-hdf5-mod=/path/to/hdf5/install” when configuring Darshan.
    • Due to ABI incompatibility from HDF5 version 1.8.x -> 1.10.x, special care must be taken to ensure users do not link applications with HDF5 versions that are incompatible with the version the Darshan library was built with (i.e., both HDF5 library versions must be either >=1.10 or <1.10). Using two incompatible HDF5 versions will lead to either link or runtime failures.
    • Support only intended for HDF5 versions 1.8.0+.
  • Added new feature allowing for instrumentation of non-MPI applications.
    • Darshan no longer strictly requires that instrumented applications use MPI, extending coverage to a breadth of new contexts.
    • Note that this feature is only functional in dynamic linking use cases.
    • Thanks to Glenn Lockwood (NERSC) for his help in implementing/testing this feature.
  • Added MPI-IO offset information to Darshan’s DXT tracing mechanism.
  • Updated Darshan compiler wrappers and Cray software modules to transparently and uniformly support dynamic and static linking cases. These methods previously only supported static linking uses cases.
  • Re-implemented Darshan’s PMPI/MPI wrappers to help avoid deadlock with other monitoring tools that rely on PMPI.
  • Added new “–log-path” option to darshan-config utility to allow users to more easily query the directory Darshan logs are stored in.

Please review darshan-runtime and darshan-util documentation for more details on the new HDF5 instrumentation module and the experimental non-MPI instrumentation mechanism. Additionally, consult the ChangeLog in the top-level of the source for a full list of changes associated with this release.

Note that we are currently aware of and looking into a couple of issues related to Lustre file systems that have been reported by Darshan users:

  • Crashes in Darshan’s Lustre module in newer Lustre versions (2.11.x in one reported case). Typically results in additional errors stating: “using old ioctl(LL_IOC_LOV_GETSTRIPE)”.
    • If you experience this problem with Darshan, a temporary workaround would be to just disable the Lustre module — this can only be done at configure time by passing “–disable-lustre-mod”.
  • Floating point exceptions or other warnings related to dividing by zero when writing Darshan log to a Lustre file system (at Darshan shutdown time).
    • We are still working out what combinations of MPI and Lustre libraries exhibit this problem, but a simple workaround in the time being is to run the command “export DARSHAN_LOGHINTS=” before running your application.

We hope to resolve these bugs quickly and intend to release an updated version of Darshan once they are.

Please report any additional questions, issues, or concerns using the Darshan-users mailing list, or by opening an issue on the Darshan GitLab page.

New experimental version of Darshan available for instrumenting non-MPI applications

An experimental pre-release of Darshan is now available that enables instrumentation of non-MPI workloads. It can be downloaded here. It is NOT recommended to use this version in production until we have had more time for users to test it.

See the darshan-runtime documentation (located in darshan-runtime/docs from the top-level Darshan repo) for more information on how to build Darshan without MPI support and also how to enable non-MPI instrumentation at application runtime.

Note that this instrumentation method only works on dynamically-linked executables — Darshan still does not support instrumentation of statically-linked non-MPI executables.

We encourage users that are interested in characterizing I/O in non-MPI contexts to try out this new functionality and let us know about any issues or questions you might have! Depending on user experience, we will try to get a release of this software suitable for production deployment soon.

Darshan at SC19 recap

In case you missed any of it, here’s a list of Darshan-related activities from SC that maybe of interest to the community:

Darshan version 3.1.8 now available

Darshan 3.1.8 is now available for download here.

This release introduces a new trace triggering mechanism that allows users to specify triggers that dictate which files are traced using Darshan’s tracing module, DXT. Users just need to provide Darshan a configuration file describing the triggers and Darshan will decide at runtime which files to store trace data for. Types of triggers include file- and rank-based triggers (based on regex patterns), as well as file access characteristics triggers (to trace based on frequency of small or unaligned I/O accesses). Please refer to darshan-runtime documentation on the DXT module for more details.

Note that full tracing is disabled by default in Darshan and this release does not change that — this is just a mechanism to allow DXT users more control over tracing.

Please report any questions, issues, or concerns using the Darshan-users mailing list, or by opening an issue on the Darshan GitLab page.

Darshan 3.1.7 release is now available

Darshan version 3.1.7 is now available for release HERE! This version addresses a few bug fixes in the prior Darshan release and also contains a couple of new features:

  • Bug fix in handling of DXT module data in the darshan-convert utility
    • Reported by Mahzad Khoshlessan
  • Bug fix in darshan-parser backwards compatibility: Darshan logs generated by Darshan versions prior to 3.1.0 may have included STDIO counters that were not properly up-converted
    • Reported by Teng Wang
  • Bug fix to MiB reported in I/O performance estimate of darshan-job-summary when both POSIX and STDIO data present
    • Reported/fixed by Glenn Lockwood
  • Added Darshan wrapper for ‘__open_2()’ call, needed for properly instrumenting open operations with some versions of gcc/glibc
    • Reported by Cormac Garvey
  • Added an instrumentation module for the MDHIM key/val storage system
  • Added support for properly handling ‘rename()’, ‘dup()’, ‘fileno()’, and ‘fdopen()’ operations in Darshan

Please report any questions, issues, or concerns using the Darshan-users mailing list, or by opening an issue on the Darshan GitLab page.

Darshan wins 2018 R&D 100 award

We are proud to announce that the Darshan team at Argonne National Laboratory has won a 2018 R&D 100 award!  This prestigious award is given to the 100 top technologies of the year as chosen by R&D Magazine.  We would like to sincerely thank the entire Darshan user community for supporting us and helping to make the project so successful!
For more information about the award please see the Darshan R&D 100 award news article at MCS.

Argonne National Laboratory’s R&D 100 award winning Darshan team: Kevin Harms (ALCF), Shane Snyder (MCS), Phil Carns (MCS), Rob Ross (MCS), and Rob Latham (MCS)

Phil Carns accepting the R&D 100 award at the 2018 R&D 100 banquet

Darshan at SC18

Stop by and check out the following events related to Darshan at SC18:

  • Glenn Lockwood et al. “A Year in the Life of a Parallel File System”, SC18 technical program paper.
  • Robert Latham et al. “Parallel I/O in Practice”, Tutorial.
  • Jakob Luettgau et al. “Toward Understanding I/O Behavior in HPC Workflows”, PDSW workshop paper.
  • Philip Carns et al. “Analyzing Parallel I/O”, Birds of a Feather session.