darshan.experimental.plots package

Submodules

darshan.experimental.plots.data_access_by_filesystem module

darshan.experimental.plots.heatmap_handling module

Module of data pre-processing functions for constructing the heatmap figure.

class darshan.experimental.plots.heatmap_handling.SegDict(*args, **kwargs)[source]

Bases: dict

Custom type hint class for dict_list argument in get_rd_wr_dfs().

hostname: str
id: int
rank: int
read_count: int
read_segments: DataFrame
write_count: int
write_segments: DataFrame
darshan.experimental.plots.heatmap_handling.get_aggregate_data(report: Any, mod: str = 'DXT_POSIX', ops: Sequence[str] = ['read', 'write']) DataFrame[source]

Aggregates the data based on which modules and operations are selected.

Parameters

report: a darshan.DarshanReport.

mod: the DXT module to do analysis for (i.e. “DXT_POSIX” or “DXT_MPIIO”). Default is "DXT_POSIX".

ops: a sequence of keys designating which Darshan operations to use for data aggregation. Default is ["read", "write"].

Returns

agg_df: a pd.DataFrame containing the aggregated data determined by the input modules and operations.

Raises

ValueError: raised if the selected modules/operations don’t contain any data.

Notes

Since read and write events are considered unique events, if both are selected their dataframes are simply concatenated.

Examples

agg_df generated from tests/input/sample-dxt-simple.darshan:

length start_time end_time rank

0 40 0.103379 0.103388 0 1 4000 0.104217 0.104231 0

darshan.experimental.plots.heatmap_handling.get_heatmap_df(agg_df: DataFrame, xbins: int, nprocs: int, max_time: float | None = None) DataFrame[source]

Builds an array similar to a 2D-histogram, where the y data is the unique ranks and the x data is time. Each bin is populated with the data sum and/or proportionate data sum for all IO events read/written during the time spanned by the bin.

Parameters

agg_df: a pd.DataFrame containing the aggregated data determined by the input modules and operations.

xbins: the number of x-axis bins to create.

nprocs: the number of MPI ranks/processes used at runtime.

max_time: the maximum time, since input DXT data is not necessarily

bounded by wallclock duration

Returns

hmap_df: dataframe with time intervals for columns and rank index (0, 1, etc.) for rows, where each element contains the data read/written by the corresponding rank in the given time interval.

Examples

The first column/bin for the hmap_df generated from “examples/example-logs/ior_hdf5_example.darshan”:

(0.0, 0.09552296002705891]

rank 0 8.951484e+05 1 3.746313e+05 2 6.350999e+05 3 1.048576e+06

darshan.experimental.plots.heatmap_handling.get_rd_wr_dfs(dict_list: Sequence[SegDict], ops: Sequence[str] = ['read', 'write']) Dict[str, DataFrame][source]

Uses the DXT records to construct individual dataframes for both read and write segments.

Parameters

dict_list: a sequence of DXT records, where each record is a Python dictionary with the following keys: ‘id’, ‘rank’, ‘hostname’, ‘write_count’, ‘read_count’, ‘write_segments’, and ‘read_segments’. The read/write data is stored in read_segments and write_segments, where each is a `pd.DataFrame containing the following data (columns): ‘offset’, ‘length’, ‘start_time’, ‘end_time’.

ops: a sequence of keys designating which Darshan operations to collect data for. Default is ["read", "write"].

Returns

rd_wr_dfs: dictionary where each key is an operation from the input ops parameter (i.e. “read”, “write”) and each value is a pd.DataFrame object containing all of the read/write events.

Notes

Used in get_single_df_dict().

Examples

dict_list and rd_wr_dfs generated from tests/input/sample-dxt-simple.darshan:

dict_list = [
{

‘id’: 14388265063268455899, ‘rank’: 0, ‘hostname’: ‘sn176.localdomain’, ‘write_count’: 1, ‘read_count’: 0, ‘write_segments’:

offset length start_time end_time 0 0 40 0.103379 0.103388,

‘read_segments’:

Empty DataFrame Columns: [] Index: []

}, {

‘id’: 9457796068806373448, ‘rank’: 0, ‘hostname’: ‘sn176.localdomain’, ‘write_count’: 1, ‘read_count’: 0, ‘write_segments’:

offset length start_time end_time 0 0 4000 0.104217 0.104231,

‘read_segments’:

Empty DataFrame Columns: [] Index: []

},

]

rd_wr_dfs = {
‘read’:

Empty DataFrame Columns: [] Index: [],

‘write’:

length start_time end_time rank 0 40 0.103379 0.103388 0 1 4000 0.104217 0.104231 0

}

darshan.experimental.plots.heatmap_handling.get_single_df_dict(report: Any, mod: str = 'DXT_POSIX', ops: Sequence[str] = ['read', 'write']) Dict[str, DataFrame][source]

Reorganizes segmented read/write data into a single pd.DataFrame and stores them in a dictionary with an entry for each DXT module.

Parameters

report: a darshan.DarshanReport.

mod: the DXT module to do analysis for (i.e. “DXT_POSIX” or “DXT_MPIIO”). Default is "DXT_POSIX".

ops: a sequence of keys designating which Darshan operations to use for data aggregation. Default is ["read", "write"].

Returns

flat_data_dict: a nested dictionary where the input module keys (i.e. “DXT_POSIX”) are the top level keys, which contain an entry for each input operation (i.e. “read”/”write”) that map to dataframes containing all events for the specified operation.

Examples

flat_data_dict generated from tests/input/sample-dxt-simple.darshan:
{
‘read’:

Empty DataFrame Columns: [] Index: [],

‘write’:

length start_time end_time rank 0 40 0.103379 0.103388 0 1 4000 0.104217 0.104231 0

}

darshan.experimental.plots.plot_access_histogram module

darshan.experimental.plots.plot_access_histogram.autolabel(ax, rects)[source]

Attach a text label above each bar in rects, displaying its value.

darshan.experimental.plots.plot_access_histogram.plot_access_histogram(report, mod, ax=None)[source]

Plots a histogram of access sizes for specified module.

Args:

report (darshan.DarshanReport): report to generate plot from mod (str): mod-string for which to generate access_histogram

darshan.experimental.plots.plot_common_access_table module

class darshan.experimental.plots.plot_common_access_table.DarshanReportTable(df: Any, **kwargs)[source]

Bases: object

Stores table figures in dataframe and html formats.

Parameters

df: a pd.DataFrame.

kwargs: keyword arguments passed to pd.DataFrame.to_html().

darshan.experimental.plots.plot_common_access_table.collapse_access_cols(df: Any, col_name: str) Any[source]

Collapses all columns into a single column named col_name.

Parameters

df: a pd.DataFrame.

col_name: name of new column to store collapsed data.

Returns

A pd.DataFrame containing all data collapsed into column col_name.

darshan.experimental.plots.plot_common_access_table.combine_access_sizes(df: Any) Any[source]

Combines rows with identical values in the “Access Size” column and calculates the sum for all other numeric columns.

Parameters

df: a pd.DataFrame with a column named “Access Size”.

Returns

A pd.DataFrame where “Access Size” column is the index and remaining columns contain the summed data from grouped rows.

darshan.experimental.plots.plot_common_access_table.get_access_count_df(mod_df: Any, mod: str) Any[source]

Creates a dataframe containing only the access size and count data.

Parameters

mod_df: “counters” dataframe for the input module mod from a darshan.DarshanReport.

mod: the module to obtain the common accesses table for (i.e “POSIX”, “MPI-IO”, “H5D”).

Returns

A pd.DataFrame containing all access size data and their respective counts.

darshan.experimental.plots.plot_common_access_table.get_most_common_access_sizes(df: Any, n_rows: int = 4) Any[source]

Returns the rows with the n_rows largest “Count” values.

Parameters

df: a pd.DataFrame with a column named “Count”.

n_rows: number of rows to keep.

Returns

A pd.DataFrame containing the largest n_rows “Count” values.

darshan.experimental.plots.plot_common_access_table.plot_common_access_table(report: DarshanReport, mod: str, n_rows: int = 4) DarshanReportTable[source]

Creates a table containing the most common access sizes and their counts.

Parameters

report: a darshan.DarshanReport.

mod: the module to obtain the common access size table for (i.e “POSIX”, “MPI-IO”, “H5D”).

n_rows: number of rows to keep.

Returns

common_access_table: a DarshanReportTable containing the n_rows most common access sizes and their counts for the specified module. The table is sorted in descending order based on the access size count and can be retrieved as either a pd.DataFrame or html table via the df or html attributes, respectively.

darshan.experimental.plots.plot_common_access_table.remove_nonzero_rows(df: Any) Any[source]

Removes dataframe rows that contain all zero values.

Parameters

df: a pd.DataFrame.

Returns

A pd.DataFrame containing a subset of rows from the input dataframe, where each row contains at least 1 non-zero value.

darshan.experimental.plots.plot_dxt_heatmap module

Module for creating the ranks vs. time IO intensity heatmap figure for the Darshan job summary.

darshan.experimental.plots.plot_dxt_heatmap.adjust_for_colorbar(jointgrid: Any, fig_right: float, cbar_x0: float)[source]

Makes various subplot location adjustments such that a colorbar can fit in the overal figure panel.

Parameters

jointgrid: a sns.axisgrid.JointGrid object.

fig_right: the location to set for the right side of the heatmap figure.

cbar_x0: the x-axis location of the colorbar.

darshan.experimental.plots.plot_dxt_heatmap.determine_hmap_runtime(report: DarshanReport) Tuple[float, float][source]

Determine the effective heatmap runtime to be used for plotting in cases where only DXT, only HEATMAP, or both module types are available, to achieve a common max displayed runtime.

Parameters

report: a darshan.DarshanReport

Returns

A tuple containing tmax, runtime floats.

darshan.experimental.plots.plot_dxt_heatmap.get_x_axis_tick_labels(max_time: float, n_xlabels: int = 4) npt.NDArray[np.float64] | npt.NDArray[np.intc][source]

Creates the x-axis tick mark labels.

Parameters

max_time: the maximum time to plot.

n_xlabels: the number of x-axis tick marks to create. Default is 4.

Returns

x_ticklabels: array of x-axis tick mark labels of length n_xlabels.

darshan.experimental.plots.plot_dxt_heatmap.get_x_axis_ticks(bin_max: float, n_xlabels: int = 4) npt.NDArray[np.float64][source]

Creates the x-axis tick mark locations.

Parameters

bin_max: the maximum number of bins.

n_xlabels: the number of x-axis tick marks to create. Default is 4.

Returns

Array of x-axis tick mark locations of length n_xlabels.

darshan.experimental.plots.plot_dxt_heatmap.get_y_axis_tick_labels(ax: Any, n_ylabels: int = 6) npt.NDArray[np.intc][source]

Sets the y-axis tick mark labels.

Parameters

ax: a matplotlib axis object.

n_ylabels: The number of y-axis tick mark labels to create. Default is 6.

Returns

y_ticklabels: array of y-axis tick mark labels of length n_ylabels.

darshan.experimental.plots.plot_dxt_heatmap.get_y_axis_ticks(ax: Any, n_ylabels: int = 6) npt.NDArray[np.float64][source]

Creates the y-axis tick mark locations.

Parameters

ax: a matplotlib axis object.

n_ylabels: The number of y-axis tick mark labels to create. Default is 6.

Returns

yticks: array of y-axis tick mark locations of length n_ylabels.

darshan.experimental.plots.plot_dxt_heatmap.get_yticklabels(ax: Any) List[str][source]

Utility function for get_y_axis_tick_labels that retrieves the y-axis tick mark labels from the input axis.

Parameters

ax: a matplotlib axis object.

Returns

y_ticklabels: list of y-axis tick mark labels of length n_ylabels.

darshan.experimental.plots.plot_dxt_heatmap.plot_heatmap(report: DarshanReport, mod: str = 'DXT_POSIX', ops: Sequence[str] = ['read', 'write'], xbins: int = 200, submodule: str | None = None) Any[source]

Creates a heatmap with marginal bar graphs and colorbar.

Parameters

report: a darshan.DarshanReport.

mod: the DXT module to do analysis for (i.e. “DXT_POSIX” or “DXT_MPIIO”). Default is "DXT_POSIX".

ops: a sequence of keys designating which Darshan operations to use for data aggregation. Default is ["read", "write"].

xbins: the number of x-axis bins to create; it has

no effect when mod is HEATMAP

submodule: when mod is HEATMAP this specifies the

source of the runtime heatmap data, otherwise it has no effect

Returns

jgrid: a sns.axisgrid.JointGrid object containing a heat map of IO data, marginal bar graphs, and a colorbar.

Raises

NotImplementedError: if a DXT module is not input (i.e. “DXT_POSIX”).

ValueError: if the input module is not in the DarshanReport.

darshan.experimental.plots.plot_dxt_heatmap.remove_marginal_graph_ticks_and_labels(marg_x: Any, marg_y: Any)[source]

Removes the frame, tick marks, and tick mark labels for the marginal bar graphs.

Parameters

marg_x : a x-axis marginal bar graph object.

marg_y : a y-axis marginal bar graph object.

darshan.experimental.plots.plot_dxt_heatmap.set_x_axis_ticks_and_labels(jointgrid: Any, tmax: float, bin_max: float, n_xlabels: int = 4)[source]

Sets the x-axis tick mark locations and labels.

Parameters

jointgrid: a sns.axisgrid.JointGrid object.

tmax: the maximum time to plot.

bin_max: the maximum number of bins.

n_xlabels: the number of x-axis tick marks to create. Default is 4.

darshan.experimental.plots.plot_dxt_heatmap.set_y_axis_ticks_and_labels(jointgrid: Any, n_ylabels: int = 6)[source]

Sets the y-axis tick mark locations and labels.

Parameters

jointgrid: a sns.axisgrid.JointGrid object.

n_ylabels: The number of y-axis tick mark labels to create. Default is 6.

darshan.experimental.plots.plot_dxt_heatmap2 module

darshan.experimental.plots.plot_dxt_heatmap2.plot_dxt_heatmap2(report, xbins=10, ybins=None, group_by='rank', mods=None, ops=None, display_values=False, cmap=None, figsize=None, ax=None, amplify=False)[source]

Generates a heatmap plot from a report with DXT traces.

Parameters:
  • report (darshan.DarshanReport) – report to generate plot from

  • xbins (int) – number of bins on the x axis

  • ybins (int) – number of bins on the y axis

  • group_by (str) – attribute to group by (e.g., rank, hostname)

  • mods (list) – modules to include in heatmap (e.g., [‘DXT_POSIX’, ‘DXT_MPIIO’])

  • ops (list) – operations to consider (e.g., [‘read’, ‘write’]

  • display_values (bool) – show values per heatmap field

  • cmap – overwrite colormap (see matplotlib colormaps)

  • figsize – change figure size (see matplotlib figsize)

  • amplify (int) – paint neighbouring cells e.g., when working with many ranks

darshan.experimental.plots.plot_io_cost module

Module for creating the I/O cost bar graph for the Darshan job summary.

darshan.experimental.plots.plot_io_cost.combine_hdf5_modules(df: Any) Any[source]

Combines the “H5F” and “H5D” rows in the input dataframe into a single entry under the “HDF5” title.

Parameters

df: a pd.DataFrame containing the average read, write, and meta times for various pydarshan modules (i.e. “POSIX”, “MPI-IO”, “STDIO”).

Returns

Modified version of the input dataframe, where if either or both “H5F” and “H5D” modules are present, they have been renamed and/or summed under a new index “HDF5”, if available.

Notes

If a single HDF5-related module is present it will be renamed as “HDF5”. If no HDF5-related modules are present the dataframe will be unchanged.

darshan.experimental.plots.plot_io_cost.combine_pnetcdf_modules(df: Any) Any[source]
darshan.experimental.plots.plot_io_cost.get_by_avg_series(df: Any, mod_key: str, nprocs: int) Any[source]

Create the “by-average” series for the stacked bar graph in the I/O cost figure.

Parameters

df: the dataframe containing the relevant data, typically the “fcounter” data from a Darshan report.

mod_key: module to generate the I/O cost stacked bar graph for (i.e. “POSIX”, “MPI-IO”, “STDIO”).

nprocs: the number of MPI ranks used for the log of interest.

Returns

by_avg_series: a pd.Series containing the average read, write, meta, and wait times.

darshan.experimental.plots.plot_io_cost.get_io_cost_df(report: DarshanReport) Any[source]

Generates the I/O cost dataframe which contains the raw data to plot the I/O cost stacked bar graph.

Parameters

report: a darshan.DarshanReport.

Returns

io_cost_df: a pd.DataFrame containing the average read, write, and meta times.

darshan.experimental.plots.plot_io_cost.plot_io_cost(report: DarshanReport) Any[source]

Creates a stacked bar graph illustrating the percentage of runtime spent in read, write, and metadata operations.

Parameters

report: a darshan.DarshanReport.

Returns

io_cost_fig: a matplotlib.pyplot.figure object containing a stacked bar graph of the average read, write, and metadata times.

darshan.experimental.plots.plot_opcounts module

darshan.experimental.plots.plot_opcounts.autolabel(ax, rects)[source]

Attach a text label above each bar in rects, displaying its height.

darshan.experimental.plots.plot_opcounts.gather_count_data(report, mod)[source]

Collect the module counts and labels for the I/O Operation Count plot.

darshan.experimental.plots.plot_opcounts.plot_opcounts(report, mod, ax=None)[source]

Generates a bar chart summary for operation counts.

Parameters

report (DarshanReport): darshan report object to plot

mod: the module to plot operation counts for (i.e. “POSIX”, “MPI-IO”, “STDIO”, “H5F”, “H5D”). If “H5D” is input the returned figure will contain both “H5F” and “H5D” module data.

darshan.experimental.plots.plot_posix_access_pattern module

darshan.experimental.plots.plot_posix_access_pattern.autolabel(ax, rects)[source]

Attach a text label above each bar in rects, displaying its value.

darshan.experimental.plots.plot_posix_access_pattern.plot_posix_access_pattern(record, ax=None)[source]

Plots read/write access patterns (sequential vs consecutive access counts) for a given POSIX module file record.

Args:

record (dict): POSIX module record to plot access pattern for.