Darshan – Page 2 – HPC I/O Characterization Tool

Darshan version 3.3.0-pre2 is now available

April 30, 2021 by Snyder, Shane

We are happy to announce a new pre-release for Darshan 3.3.0 (darshan-3.3.0-pre2). You can download the source HERE.

This release contains a number of new features, bug fixes, and other improvements as detailed below:

New PyDarshan Python package for analyzing Darshan log files
- PyDarshan provides a couple of interfaces to Darshan logs that should allow for easier development of custom Darshan log analysis utilities in Python
- See the PyDarshan documentation for more details
- Thanks to Jakob Luettgau (DKRZ) for all of the hard work in contributing this package
Bug fixes
- Modified Lustre module to use a safer method for obtaining Lustre file striping information (based on fgetxattr rather than ioctl)
- Fixed bug leading to potential deadlock when reducing shared records in MPI programs (known to affect mvapich2)
- Fixed bug causing errors when using Darshan’s non-MPI mode when Darshan is built with an MPI compiler
- Disabled DXT’s MPI-IO offset tracking for OpenMPI applications to avoid crashes caused by an OpenMPI bug
- Fixed various HDF5 module bugs:
  - Fixes for applications using H5S_SELECT_NONE selections resulting in HDF5 error messages
  - Fixes for applications using non-MPIIO VFDs resulting in HDF5 error messages
  - Fixes for potentially incorrect counter values related to common accesses in the H5D module
  - Other fixes allowing usage of the HDF5 modules in serial HDF5 applications
Other enhancements
- Added support for querying Lustre file striping statistics for Lustre files that are symlinked from other file systems
- Added support for instrumenting openat, preadv, preadv2, pwritev, and pwritev2 functions, improving instrumentation of OpenMPI applications
- Improved error messages and documentation for darshan-util tools, including handling of incomplete Darshan log files
- Added new H5D module counter indicating the Darshan record ID of the file an HDF5 dataset belongs to

As always, please report any issues, comments, or questions to us using the Darshan-users mailing list or our GitLab page.

Updated public data sets

September 23, 2020 by carns

The data page on this web site has been updated to include new public data sets, including summary data provided by the ALCF and anonymized logs provided by the NCSA.

Darshan 3.2.1 bugfix release available

May 15, 2020 by carns

Due to a reported bug in last week’s 3.2.0 release of Darshan, we have decided to quickly release Darshan 3.2.1 for our users. It is available for download here.

This bugfix is somewhat critical, particularly in production environments, as it is can lead to corrupted Darshan log file data and, potentially, application crashes (though we have not triggered any crashes in our testing). The issue was originally detected by noticing bogus values in the COMMON_ACCESS counters reported by the POSIX, MPIIO, and H5 modules.

In any case, we highly recommend any 3.2.0 users upgrade to this version to avoid any potential for crashes or corrupted Darshan log file data.

Please report any additional questions, issues, or concerns using the Darshan-users mailing list, or by opening an issue on the Darshan GitLab page.

Darshan version 3.2.0 is now officially available

May 5, 2020 by carns

Darshan 3.2.0 is now available for download here.

This release contains a number of new features, bug fixes, and other changes to Darshan. Some of the more notable changes that may be of interest to users:

Added detailed instrumentation of HDF5 file (H5F) and dataset (H5D) interfaces.
- Must be explicitly enabled by passing “–enable-hdf5-mod=/path/to/hdf5/install” when configuring Darshan.
- Due to ABI incompatibility from HDF5 version 1.8.x -> 1.10.x, special care must be taken to ensure users do not link applications with HDF5 versions that are incompatible with the version the Darshan library was built with (i.e., both HDF5 library versions must be either >=1.10 or <1.10). Using two incompatible HDF5 versions will lead to either link or runtime failures.
- Support only intended for HDF5 versions 1.8.0+.
Added new feature allowing for instrumentation of non-MPI applications.
- Darshan no longer strictly requires that instrumented applications use MPI, extending coverage to a breadth of new contexts.
- Note that this feature is only functional in dynamic linking use cases.
- Thanks to Glenn Lockwood (NERSC) for his help in implementing/testing this feature.
Added MPI-IO offset information to Darshan’s DXT tracing mechanism.
Updated Darshan compiler wrappers and Cray software modules to transparently and uniformly support dynamic and static linking cases. These methods previously only supported static linking uses cases.
Re-implemented Darshan’s PMPI/MPI wrappers to help avoid deadlock with other monitoring tools that rely on PMPI.
Added new “–log-path” option to darshan-config utility to allow users to more easily query the directory Darshan logs are stored in.

Please review darshan-runtime and darshan-util documentation for more details on the new HDF5 instrumentation module and the experimental non-MPI instrumentation mechanism. Additionally, consult the ChangeLog in the top-level of the source for a full list of changes associated with this release.

Note that we are currently aware of and looking into a couple of issues related to Lustre file systems that have been reported by Darshan users:

Crashes in Darshan’s Lustre module in newer Lustre versions (2.11.x in one reported case). Typically results in additional errors stating: “using old ioctl(LL_IOC_LOV_GETSTRIPE)”.
- If you experience this problem with Darshan, a temporary workaround would be to just disable the Lustre module — this can only be done at configure time by passing “–disable-lustre-mod”.
Floating point exceptions or other warnings related to dividing by zero when writing Darshan log to a Lustre file system (at Darshan shutdown time).
- We are still working out what combinations of MPI and Lustre libraries exhibit this problem, but a simple workaround in the time being is to run the command “export DARSHAN_LOGHINTS=” before running your application.

We hope to resolve these bugs quickly and intend to release an updated version of Darshan once they are.

Please report any additional questions, issues, or concerns using the Darshan-users mailing list, or by opening an issue on the Darshan GitLab page.

New experimental version of Darshan available for instrumenting non-MPI applications

December 11, 2019 by carns

An experimental pre-release of Darshan is now available that enables instrumentation of non-MPI workloads. It can be downloaded here. It is NOT recommended to use this version in production until we have had more time for users to test it.

See the darshan-runtime documentation (located in darshan-runtime/docs from the top-level Darshan repo) for more information on how to build Darshan without MPI support and also how to enable non-MPI instrumentation at application runtime.

Note that this instrumentation method only works on dynamically-linked executables — Darshan still does not support instrumentation of statically-linked non-MPI executables.

We encourage users that are interested in characterizing I/O in non-MPI contexts to try out this new functionality and let us know about any issues or questions you might have! Depending on user experience, we will try to get a release of this software suitable for production deployment soon.

Darshan at SC19 recap

December 11, 2019 by carns

In case you missed any of it, here’s a list of Darshan-related activities from SC that maybe of interest to the community:

Glenn Lockwood et al. “Understanding Data Motion in the Modern HPC Data Center”, PDSW workshop paper.
Bing Xie et al. “Applying Machine Learning to Understand Write Performance of Large-scale Parallel Filesystems”, PDSW workshop paper.
Shane Snyder et al. “Analyzing Parallel I/O”, Birds of a Feather session.

Darshan version 3.1.8 now available

November 8, 2019 by carns

Darshan 3.1.8 is now available for download here.

This release introduces a new trace triggering mechanism that allows users to specify triggers that dictate which files are traced using Darshan’s tracing module, DXT. Users just need to provide Darshan a configuration file describing the triggers and Darshan will decide at runtime which files to store trace data for. Types of triggers include file- and rank-based triggers (based on regex patterns), as well as file access characteristics triggers (to trace based on frequency of small or unaligned I/O accesses). Please refer to darshan-runtime documentation on the DXT module for more details.

Note that full tracing is disabled by default in Darshan and this release does not change that — this is just a mechanism to allow DXT users more control over tracing.

Please report any questions, issues, or concerns using the Darshan-users mailing list, or by opening an issue on the Darshan GitLab page.

Software Innovation Boosts Efficiency for World’s Fastest Supercomputers

March 11, 2019 by carns

R&D Magazine recently published an article by Laura French entitled “Software Innovation Boosts Efficiency for World’s Fastest Supercomputers“. It highlights Darshan’s impact in the field and the achievements that contributed to the 2018 R&D 100 award.

Examples of systems that have deployed Darshan in production for 24/7 instrumentation.

Darshan 3.1.7 release is now available

January 22, 2019 by carns

Darshan version 3.1.7 is now available for release HERE! This version addresses a few bug fixes in the prior Darshan release and also contains a couple of new features:

Bug fix in handling of DXT module data in the darshan-convert utility
- Reported by Mahzad Khoshlessan
Bug fix in darshan-parser backwards compatibility: Darshan logs generated by Darshan versions prior to 3.1.0 may have included STDIO counters that were not properly up-converted
- Reported by Teng Wang
Bug fix to MiB reported in I/O performance estimate of darshan-job-summary when both POSIX and STDIO data present
- Reported/fixed by Glenn Lockwood
Added Darshan wrapper for ‘__open_2()’ call, needed for properly instrumenting open operations with some versions of gcc/glibc
- Reported by Cormac Garvey
Added an instrumentation module for the MDHIM key/val storage system
Added support for properly handling ‘rename()’, ‘dup()’, ‘fileno()’, and ‘fdopen()’ operations in Darshan

Please report any questions, issues, or concerns using the Darshan-users mailing list, or by opening an issue on the Darshan GitLab page.

Darshan wins 2018 R&D 100 award

December 6, 2018 by carns

We are proud to announce that the Darshan team at Argonne National Laboratory has won a 2018 R&D 100 award! This prestigious award is given to the 100 top technologies of the year as chosen by R&D Magazine. We would like to sincerely thank the entire Darshan user community for supporting us and helping to make the project so successful!
For more information about the award please see the Darshan R&D 100 award news article at MCS.