darshan.experimental.plots package

Submodules

darshan.experimental.plots.data_access_by_filesystem module

darshan.experimental.plots.heatmap_handling module

Module of data pre-processing functions for constructing the heatmap figure.

class darshan.experimental.plots.heatmap_handling.SegDict(*args, **kwargs)[source]

Bases: dict

Custom type hint class for dict_list argument in get_rd_wr_dfs().

hostname: str

id: int

rank: int

read_count: int

read_segments: DataFrame

write_count: int

write_segments: DataFrame

darshan.experimental.plots.heatmap_handling.get_aggregate_data(report: Any, mod: str = 'DXT_POSIX', ops: Sequence[str] = ['read', 'write']) → DataFrame[source]: Aggregates the data based on which modules and operations are selected.

Parameters

report: a darshan.DarshanReport.

mod: the DXT module to do analysis for (i.e. “DXT_POSIX” or “DXT_MPIIO”). Default is "DXT_POSIX".

ops: a sequence of keys designating which Darshan operations to use for data aggregation. Default is ["read", "write"].

Returns

agg_df: a pd.DataFrame containing the aggregated data determined by the input modules and operations.

Raises

ValueError: raised if the selected modules/operations don’t contain any data.

Notes

Since read and write events are considered unique events, if both are selected their dataframes are simply concatenated.

Examples

agg_df generated from tests/input/sample-dxt-simple.darshan:

length start_time end_time rank

0 40 0.103379 0.103388 0 1 4000 0.104217 0.104231 0

darshan.experimental.plots.heatmap_handling.get_heatmap_df(agg_df: DataFrame, xbins: int, nprocs: int, max_time: float | None = None) → DataFrame[source]

Builds an array similar to a 2D-histogram, where the y data is the unique ranks and the x data is time. Each bin is populated with the data sum and/or proportionate data sum for all IO events read/written during the time spanned by the bin.

Parameters

agg_df: a pd.DataFrame containing the aggregated data determined by the input modules and operations.

xbins: the number of x-axis bins to create.

nprocs: the number of MPI ranks/processes used at runtime.

max_time: the maximum time, since input DXT data is not necessarily: bounded by wallclock duration

Returns

hmap_df: dataframe with time intervals for columns and rank index (0, 1, etc.) for rows, where each element contains the data read/written by the corresponding rank in the given time interval.

Examples

The first column/bin for the hmap_df generated from “examples/example-logs/ior_hdf5_example.darshan”:

(0.0, 0.09552296002705891]

rank 0 8.951484e+05 1 3.746313e+05 2 6.350999e+05 3 1.048576e+06

darshan.experimental.plots.heatmap_handling.get_rd_wr_dfs(dict_list: Sequence[SegDict], ops: Sequence[str] = ['read', 'write']) → Dict[str, DataFrame][source]

Uses the DXT records to construct individual dataframes for both read and write segments.

Parameters

dict_list: a sequence of DXT records, where each record is a Python dictionary with the following keys: ‘id’, ‘rank’, ‘hostname’, ‘write_count’, ‘read_count’, ‘write_segments’, and ‘read_segments’. The read/write data is stored in read_segments and write_segments, where each is a `pd.DataFrame containing the following data (columns): ‘offset’, ‘length’, ‘start_time’, ‘end_time’.

ops: a sequence of keys designating which Darshan operations to collect data for. Default is ["read", "write"].

Returns

rd_wr_dfs: dictionary where each key is an operation from the input ops parameter (i.e. “read”, “write”) and each value is a pd.DataFrame object containing all of the read/write events.

Notes

Used in get_single_df_dict().

Examples

dict_list and rd_wr_dfs generated from tests/input/sample-dxt-simple.darshan:

dict_list = [

{
‘id’: 14388265063268455899, ‘rank’: 0, ‘hostname’: ‘sn176.localdomain’, ‘write_count’: 1, ‘read_count’: 0, ‘write_segments’:

offset length start_time end_time 0 0 40 0.103379 0.103388,

‘read_segments’:
Empty DataFrame Columns: [] Index: []

}, {

‘id’: 9457796068806373448, ‘rank’: 0, ‘hostname’: ‘sn176.localdomain’, ‘write_count’: 1, ‘read_count’: 0, ‘write_segments’:

offset length start_time end_time 0 0 4000 0.104217 0.104231,

‘read_segments’:
Empty DataFrame Columns: [] Index: []

},

]

rd_wr_dfs = {

‘read’:
Empty DataFrame Columns: [] Index: [],

‘write’:
length start_time end_time rank 0 40 0.103379 0.103388 0 1 4000 0.104217 0.104231 0

}

darshan.experimental.plots.heatmap_handling.get_single_df_dict(report: Any, mod: str = 'DXT_POSIX', ops: Sequence[str] = ['read', 'write']) → Dict[str, DataFrame][source]

Reorganizes segmented read/write data into a single pd.DataFrame and stores them in a dictionary with an entry for each DXT module.

Parameters

report: a darshan.DarshanReport.

mod: the DXT module to do analysis for (i.e. “DXT_POSIX” or “DXT_MPIIO”). Default is "DXT_POSIX".

ops: a sequence of keys designating which Darshan operations to use for data aggregation. Default is ["read", "write"].

Returns

flat_data_dict: a nested dictionary where the input module keys (i.e. “DXT_POSIX”) are the top level keys, which contain an entry for each input operation (i.e. “read”/”write”) that map to dataframes containing all events for the specified operation.

Examples

flat_data_dict generated from tests/input/sample-dxt-simple.darshan:

{

‘read’:: Empty DataFrame Columns: [] Index: [],
‘write’:: length start_time end_time rank 0 40 0.103379 0.103388 0 1 4000 0.104217 0.104231 0

}

darshan.experimental.plots.plot_access_histogram module

darshan.experimental.plots.plot_access_histogram.autolabel(ax, rects)[source]: Attach a text label above each bar in rects, displaying its value.

darshan.experimental.plots.plot_access_histogram.plot_access_histogram(report, mod, ax=None)[source]

Plots a histogram of access sizes for specified module.

Args:
report (darshan.DarshanReport): report to generate plot from mod (str): mod-string for which to generate access_histogram

darshan.experimental.plots.plot_common_access_table module

class darshan.experimental.plots.plot_common_access_table.DarshanReportTable(df: Any, **kwargs)[source]

Bases: object

Stores table figures in dataframe and html formats.

Parameters

df: a pd.DataFrame.

kwargs: keyword arguments passed to pd.DataFrame.to_html().

darshan.experimental.plots.plot_common_access_table.collapse_access_cols(df: Any, col_name: str) → Any[source]: Collapses all columns into a single column named col_name.

Parameters

df: a pd.DataFrame.

col_name: name of new column to store collapsed data.

Returns

A pd.DataFrame containing all data collapsed into column col_name.

darshan.experimental.plots.plot_common_access_table.combine_access_sizes(df: Any) → Any[source]: Combines rows with identical values in the “Access Size” column and calculates the sum for all other numeric columns.

Parameters

df: a pd.DataFrame with a column named “Access Size”.

Returns

A pd.DataFrame where “Access Size” column is the index and remaining columns contain the summed data from grouped rows.

darshan.experimental.plots.plot_common_access_table.get_access_count_df(mod_df: Any, mod: str) → Any[source]: Creates a dataframe containing only the access size and count data.

Parameters

mod_df: “counters” dataframe for the input module mod from a darshan.DarshanReport.

mod: the module to obtain the common accesses table for (i.e “POSIX”, “MPI-IO”, “H5D”).

Returns

A pd.DataFrame containing all access size data and their respective counts.

darshan.experimental.plots.plot_common_access_table.get_most_common_access_sizes(df: Any, n_rows: int = 4) → Any[source]: Returns the rows with the n_rows largest “Count” values.

Parameters

df: a pd.DataFrame with a column named “Count”.

n_rows: number of rows to keep.

Returns

A pd.DataFrame containing the largest n_rows “Count” values.

darshan.experimental.plots.plot_common_access_table.plot_common_access_table(report: DarshanReport, mod: str, n_rows: int = 4) → DarshanReportTable[source]: Creates a table containing the most common access sizes and their counts.

Parameters

report: a darshan.DarshanReport.

mod: the module to obtain the common access size table for (i.e “POSIX”, “MPI-IO”, “H5D”).

n_rows: number of rows to keep.

Returns

common_access_table: a DarshanReportTable containing the n_rows most common access sizes and their counts for the specified module. The table is sorted in descending order based on the access size count and can be retrieved as either a pd.DataFrame or html table via the df or html attributes, respectively.

darshan.experimental.plots.plot_common_access_table.remove_nonzero_rows(df: Any) → Any[source]: Removes dataframe rows that contain all zero values.

Parameters

df: a pd.DataFrame.

Returns

A pd.DataFrame containing a subset of rows from the input dataframe, where each row contains at least 1 non-zero value.

darshan.experimental.plots.plot_dxt_heatmap module

Module for creating the ranks vs. time IO intensity heatmap figure for the Darshan job summary.

darshan.experimental.plots.plot_dxt_heatmap.adjust_for_colorbar(jointgrid: Any, fig_right: float, cbar_x0: float)[source]: Makes various subplot location adjustments such that a colorbar can fit in the overal figure panel.

Parameters

jointgrid: a sns.axisgrid.JointGrid object.

fig_right: the location to set for the right side of the heatmap figure.

cbar_x0: the x-axis location of the colorbar.

darshan.experimental.plots.plot_dxt_heatmap.determine_hmap_runtime(report: DarshanReport) → Tuple[float, float][source]: Determine the effective heatmap runtime to be used for plotting in cases where only DXT, only HEATMAP, or both module types are available, to achieve a common max displayed runtime.

Parameters

report: a darshan.DarshanReport

Returns

A tuple containing tmax, runtime floats.

darshan.experimental.plots.plot_dxt_heatmap.get_x_axis_tick_labels(max_time: float, n_xlabels: int = 4) → npt.NDArray[np.float64] | npt.NDArray[np.intc][source]: Creates the x-axis tick mark labels.

Parameters

max_time: the maximum time to plot.

n_xlabels: the number of x-axis tick marks to create. Default is 4.

Returns

x_ticklabels: array of x-axis tick mark labels of length n_xlabels.

darshan.experimental.plots.plot_dxt_heatmap.get_x_axis_ticks(bin_max: float, n_xlabels: int = 4) → npt.NDArray[np.float64][source]: Creates the x-axis tick mark locations.

Parameters

bin_max: the maximum number of bins.

n_xlabels: the number of x-axis tick marks to create. Default is 4.

Returns

Array of x-axis tick mark locations of length n_xlabels.

darshan.experimental.plots.plot_dxt_heatmap.get_y_axis_tick_labels(ax: Any, n_ylabels: int = 6) → npt.NDArray[np.intc][source]: Sets the y-axis tick mark labels.

Parameters

ax: a matplotlib axis object.

n_ylabels: The number of y-axis tick mark labels to create. Default is 6.

Returns

y_ticklabels: array of y-axis tick mark labels of length n_ylabels.

darshan.experimental.plots.plot_dxt_heatmap.get_y_axis_ticks(ax: Any, n_ylabels: int = 6) → npt.NDArray[np.float64][source]: Creates the y-axis tick mark locations.

Parameters

ax: a matplotlib axis object.

n_ylabels: The number of y-axis tick mark labels to create. Default is 6.

Returns

yticks: array of y-axis tick mark locations of length n_ylabels.

darshan.experimental.plots.plot_dxt_heatmap.get_yticklabels(ax: Any) → List[str][source]: Utility function for get_y_axis_tick_labels that retrieves the y-axis tick mark labels from the input axis.

Parameters

ax: a matplotlib axis object.

Returns

y_ticklabels: list of y-axis tick mark labels of length n_ylabels.

darshan.experimental.plots.plot_dxt_heatmap.plot_heatmap(report: DarshanReport, mod: str = 'DXT_POSIX', ops: Sequence[str] = ['read', 'write'], xbins: int = 200, submodule: str | None = None) → Any[source]

Creates a heatmap with marginal bar graphs and colorbar.

Parameters

report: a darshan.DarshanReport.

mod: the DXT module to do analysis for (i.e. “DXT_POSIX” or “DXT_MPIIO”). Default is "DXT_POSIX".

ops: a sequence of keys designating which Darshan operations to use for data aggregation. Default is ["read", "write"].

xbins: the number of x-axis bins to create; it has: no effect when mod is HEATMAP
submodule: when mod is HEATMAP this specifies the: source of the runtime heatmap data, otherwise it has no effect

Returns

jgrid: a sns.axisgrid.JointGrid object containing a heat map of IO data, marginal bar graphs, and a colorbar.

Raises

NotImplementedError: if a DXT module is not input (i.e. “DXT_POSIX”).

ValueError: if the input module is not in the DarshanReport.

darshan.experimental.plots.plot_dxt_heatmap.remove_marginal_graph_ticks_and_labels(marg_x: Any, marg_y: Any)[source]: Removes the frame, tick marks, and tick mark labels for the marginal bar graphs.

Parameters

marg_x : a x-axis marginal bar graph object.

marg_y : a y-axis marginal bar graph object.

darshan.experimental.plots.plot_dxt_heatmap.set_x_axis_ticks_and_labels(jointgrid: Any, tmax: float, bin_max: float, n_xlabels: int = 4)[source]: Sets the x-axis tick mark locations and labels.

Parameters

jointgrid: a sns.axisgrid.JointGrid object.

tmax: the maximum time to plot.

bin_max: the maximum number of bins.

n_xlabels: the number of x-axis tick marks to create. Default is 4.

darshan.experimental.plots.plot_dxt_heatmap.set_y_axis_ticks_and_labels(jointgrid: Any, n_ylabels: int = 6)[source]: Sets the y-axis tick mark locations and labels.

Parameters

jointgrid: a sns.axisgrid.JointGrid object.

n_ylabels: The number of y-axis tick mark labels to create. Default is 6.

darshan.experimental.plots.plot_dxt_heatmap2 module

darshan.experimental.plots.plot_dxt_heatmap2.plot_dxt_heatmap2(report, xbins=10, ybins=None, group_by='rank', mods=None, ops=None, display_values=False, cmap=None, figsize=None, ax=None, amplify=False)[source]

Generates a heatmap plot from a report with DXT traces.

Parameters:

report (darshan.DarshanReport) – report to generate plot from
xbins (int) – number of bins on the x axis
ybins (int) – number of bins on the y axis
group_by (str) – attribute to group by (e.g., rank, hostname)
mods (list) – modules to include in heatmap (e.g., [‘DXT_POSIX’, ‘DXT_MPIIO’])
ops (list) – operations to consider (e.g., [‘read’, ‘write’]
display_values (bool) – show values per heatmap field
cmap – overwrite colormap (see matplotlib colormaps)
figsize – change figure size (see matplotlib figsize)
amplify (int) – paint neighbouring cells e.g., when working with many ranks

darshan.experimental.plots.plot_io_cost module

Module for creating the I/O cost bar graph for the Darshan job summary.

darshan.experimental.plots.plot_io_cost.combine_hdf5_modules(df: Any) → Any[source]: Combines the “H5F” and “H5D” rows in the input dataframe into a single entry under the “HDF5” title.

Parameters

df: a pd.DataFrame containing the average read, write, and meta times for various pydarshan modules (i.e. “POSIX”, “MPI-IO”, “STDIO”).

Returns

Modified version of the input dataframe, where if either or both “H5F” and “H5D” modules are present, they have been renamed and/or summed under a new index “HDF5”, if available.

Notes

If a single HDF5-related module is present it will be renamed as “HDF5”. If no HDF5-related modules are present the dataframe will be unchanged.

darshan.experimental.plots.plot_io_cost.combine_pnetcdf_modules(df: Any) → Any[source]

darshan.experimental.plots.plot_io_cost.get_by_avg_series(df: Any, mod_key: str, nprocs: int) → Any[source]: Create the “by-average” series for the stacked bar graph in the I/O cost figure.

Parameters

df: the dataframe containing the relevant data, typically the “fcounter” data from a Darshan report.

mod_key: module to generate the I/O cost stacked bar graph for (i.e. “POSIX”, “MPI-IO”, “STDIO”).

nprocs: the number of MPI ranks used for the log of interest.

Returns

by_avg_series: a pd.Series containing the average read, write, meta, and wait times.

darshan.experimental.plots.plot_io_cost.get_io_cost_df(report: DarshanReport) → Any[source]: Generates the I/O cost dataframe which contains the raw data to plot the I/O cost stacked bar graph.

Parameters

report: a darshan.DarshanReport.

Returns

io_cost_df: a pd.DataFrame containing the average read, write, and meta times.

darshan.experimental.plots.plot_io_cost.plot_io_cost(report: DarshanReport) → Any[source]: Creates a stacked bar graph illustrating the percentage of runtime spent in read, write, and metadata operations.

Parameters

report: a darshan.DarshanReport.

Returns

io_cost_fig: a matplotlib.pyplot.figure object containing a stacked bar graph of the average read, write, and metadata times.

darshan.experimental.plots.plot_opcounts module

darshan.experimental.plots.plot_opcounts.autolabel(ax, rects)[source]: Attach a text label above each bar in rects, displaying its height.

darshan.experimental.plots.plot_opcounts.gather_count_data(report, mod)[source]: Collect the module counts and labels for the I/O Operation Count plot.

darshan.experimental.plots.plot_opcounts.plot_opcounts(report, mod, ax=None)[source]: Generates a bar chart summary for operation counts.

Parameters

report (DarshanReport): darshan report object to plot

mod: the module to plot operation counts for (i.e. “POSIX”, “MPI-IO”, “STDIO”, “H5F”, “H5D”). If “H5D” is input the returned figure will contain both “H5F” and “H5D” module data.

darshan.experimental.plots.plot_posix_access_pattern module

darshan.experimental.plots.plot_posix_access_pattern.autolabel(ax, rects)[source]: Attach a text label above each bar in rects, displaying its value.

darshan.experimental.plots.plot_posix_access_pattern.plot_posix_access_pattern(record, ax=None)[source]

Plots read/write access patterns (sequential vs consecutive access counts) for a given POSIX module file record.

Args:
record (dict): POSIX module record to plot access pattern for.