darshan.experimental.plots package
Submodules
darshan.experimental.plots.data_access_by_filesystem module
darshan.experimental.plots.heatmap_handling module
Module of data pre-processing functions for constructing the heatmap figure.
- class darshan.experimental.plots.heatmap_handling.SegDict(*args, **kwargs)[source]
Bases:
dict
Custom type hint class for dict_list argument in get_rd_wr_dfs().
- read_segments: DataFrame
- write_segments: DataFrame
- darshan.experimental.plots.heatmap_handling.get_aggregate_data(report: Any, mod: str = 'DXT_POSIX', ops: Sequence[str] = ['read', 'write']) DataFrame [source]
Aggregates the data based on which modules and operations are selected.
Parameters
report: a
darshan.DarshanReport
.mod: the DXT module to do analysis for (i.e. “DXT_POSIX” or “DXT_MPIIO”). Default is
"DXT_POSIX"
.ops: a sequence of keys designating which Darshan operations to use for data aggregation. Default is
["read", "write"]
.Returns
agg_df: a
pd.DataFrame
containing the aggregated data determined by the input modules and operations.Raises
ValueError: raised if the selected modules/operations don’t contain any data.
Notes
Since read and write events are considered unique events, if both are selected their dataframes are simply concatenated.
Examples
agg_df generated from tests/input/sample-dxt-simple.darshan:
length start_time end_time rank
0 40 0.103379 0.103388 0 1 4000 0.104217 0.104231 0
- darshan.experimental.plots.heatmap_handling.get_heatmap_df(agg_df: DataFrame, xbins: int, nprocs: int, max_time: float | None = None) DataFrame [source]
Builds an array similar to a 2D-histogram, where the y data is the unique ranks and the x data is time. Each bin is populated with the data sum and/or proportionate data sum for all IO events read/written during the time spanned by the bin.
Parameters
agg_df: a
pd.DataFrame
containing the aggregated data determined by the input modules and operations.xbins: the number of x-axis bins to create.
nprocs: the number of MPI ranks/processes used at runtime.
- max_time: the maximum time, since input DXT data is not necessarily
bounded by wallclock duration
Returns
hmap_df: dataframe with time intervals for columns and rank index (0, 1, etc.) for rows, where each element contains the data read/written by the corresponding rank in the given time interval.
Examples
The first column/bin for the hmap_df generated from “examples/example-logs/ior_hdf5_example.darshan”:
(0.0, 0.09552296002705891]
rank 0 8.951484e+05 1 3.746313e+05 2 6.350999e+05 3 1.048576e+06
- darshan.experimental.plots.heatmap_handling.get_rd_wr_dfs(dict_list: Sequence[SegDict], ops: Sequence[str] = ['read', 'write']) Dict[str, DataFrame] [source]
Uses the DXT records to construct individual dataframes for both read and write segments.
Parameters
dict_list: a sequence of DXT records, where each record is a Python dictionary with the following keys: ‘id’, ‘rank’, ‘hostname’, ‘write_count’, ‘read_count’, ‘write_segments’, and ‘read_segments’. The read/write data is stored in
read_segments
andwrite_segments
, where each is a`pd.DataFrame
containing the following data (columns): ‘offset’, ‘length’, ‘start_time’, ‘end_time’.ops: a sequence of keys designating which Darshan operations to collect data for. Default is
["read", "write"]
.Returns
rd_wr_dfs: dictionary where each key is an operation from the input
ops
parameter (i.e. “read”, “write”) and each value is apd.DataFrame
object containing all of the read/write events.Notes
Used in
get_single_df_dict()
.Examples
dict_list
andrd_wr_dfs
generated fromtests/input/sample-dxt-simple.darshan
:- dict_list = [
- {
‘id’: 14388265063268455899, ‘rank’: 0, ‘hostname’: ‘sn176.localdomain’, ‘write_count’: 1, ‘read_count’: 0, ‘write_segments’:
offset length start_time end_time 0 0 40 0.103379 0.103388,
- ‘read_segments’:
Empty DataFrame Columns: [] Index: []
}, {
‘id’: 9457796068806373448, ‘rank’: 0, ‘hostname’: ‘sn176.localdomain’, ‘write_count’: 1, ‘read_count’: 0, ‘write_segments’:
offset length start_time end_time 0 0 4000 0.104217 0.104231,
- ‘read_segments’:
Empty DataFrame Columns: [] Index: []
},
]
- rd_wr_dfs = {
- ‘read’:
Empty DataFrame Columns: [] Index: [],
- ‘write’:
length start_time end_time rank 0 40 0.103379 0.103388 0 1 4000 0.104217 0.104231 0
}
- darshan.experimental.plots.heatmap_handling.get_single_df_dict(report: Any, mod: str = 'DXT_POSIX', ops: Sequence[str] = ['read', 'write']) Dict[str, DataFrame] [source]
Reorganizes segmented read/write data into a single
pd.DataFrame
and stores them in a dictionary with an entry for each DXT module.Parameters
report: a
darshan.DarshanReport
.mod: the DXT module to do analysis for (i.e. “DXT_POSIX” or “DXT_MPIIO”). Default is
"DXT_POSIX"
.ops: a sequence of keys designating which Darshan operations to use for data aggregation. Default is
["read", "write"]
.Returns
flat_data_dict: a nested dictionary where the input module keys (i.e. “DXT_POSIX”) are the top level keys, which contain an entry for each input operation (i.e. “read”/”write”) that map to dataframes containing all events for the specified operation.
Examples
- flat_data_dict generated from tests/input/sample-dxt-simple.darshan:
- {
- ‘read’:
Empty DataFrame Columns: [] Index: [],
- ‘write’:
length start_time end_time rank 0 40 0.103379 0.103388 0 1 4000 0.104217 0.104231 0
}
darshan.experimental.plots.plot_access_histogram module
darshan.experimental.plots.plot_common_access_table module
- class darshan.experimental.plots.plot_common_access_table.DarshanReportTable(df: Any, **kwargs)[source]
Bases:
object
Stores table figures in dataframe and html formats.
Parameters
df: a
pd.DataFrame
.kwargs: keyword arguments passed to
pd.DataFrame.to_html()
.
- darshan.experimental.plots.plot_common_access_table.collapse_access_cols(df: Any, col_name: str) Any [source]
Collapses all columns into a single column named col_name.
Parameters
df: a
pd.DataFrame
.col_name: name of new column to store collapsed data.
Returns
A
pd.DataFrame
containing all data collapsed into column col_name.
- darshan.experimental.plots.plot_common_access_table.combine_access_sizes(df: Any) Any [source]
Combines rows with identical values in the “Access Size” column and calculates the sum for all other numeric columns.
Parameters
df: a
pd.DataFrame
with a column named “Access Size”.Returns
A
pd.DataFrame
where “Access Size” column is the index and remaining columns contain the summed data from grouped rows.
- darshan.experimental.plots.plot_common_access_table.get_access_count_df(mod_df: Any, mod: str) Any [source]
Creates a dataframe containing only the access size and count data.
Parameters
mod_df: “counters” dataframe for the input module mod from a
darshan.DarshanReport
.mod: the module to obtain the common accesses table for (i.e “POSIX”, “MPI-IO”, “H5D”).
Returns
A
pd.DataFrame
containing all access size data and their respective counts.
- darshan.experimental.plots.plot_common_access_table.get_most_common_access_sizes(df: Any, n_rows: int = 4) Any [source]
Returns the rows with the n_rows largest “Count” values.
Parameters
df: a
pd.DataFrame
with a column named “Count”.n_rows: number of rows to keep.
Returns
A
pd.DataFrame
containing the largest n_rows “Count” values.
- darshan.experimental.plots.plot_common_access_table.plot_common_access_table(report: DarshanReport, mod: str, n_rows: int = 4) DarshanReportTable [source]
Creates a table containing the most common access sizes and their counts.
Parameters
report: a
darshan.DarshanReport
.mod: the module to obtain the common access size table for (i.e “POSIX”, “MPI-IO”, “H5D”).
n_rows: number of rows to keep.
Returns
common_access_table: a
DarshanReportTable
containing the n_rows most common access sizes and their counts for the specified module. The table is sorted in descending order based on the access size count and can be retrieved as either apd.DataFrame
or html table via the df or html attributes, respectively.
- darshan.experimental.plots.plot_common_access_table.remove_nonzero_rows(df: Any) Any [source]
Removes dataframe rows that contain all zero values.
Parameters
df: a
pd.DataFrame
.Returns
A
pd.DataFrame
containing a subset of rows from the input dataframe, where each row contains at least 1 non-zero value.
darshan.experimental.plots.plot_dxt_heatmap module
Module for creating the ranks vs. time IO intensity heatmap figure for the Darshan job summary.
- darshan.experimental.plots.plot_dxt_heatmap.adjust_for_colorbar(jointgrid: Any, fig_right: float, cbar_x0: float)[source]
Makes various subplot location adjustments such that a colorbar can fit in the overal figure panel.
Parameters
jointgrid: a
sns.axisgrid.JointGrid
object.fig_right: the location to set for the right side of the heatmap figure.
cbar_x0: the x-axis location of the colorbar.
- darshan.experimental.plots.plot_dxt_heatmap.determine_hmap_runtime(report: DarshanReport) Tuple[float, float] [source]
Determine the effective heatmap runtime to be used for plotting in cases where only DXT, only HEATMAP, or both module types are available, to achieve a common max displayed runtime.
Parameters
report: a
darshan.DarshanReport
Returns
A tuple containing tmax, runtime floats.
- darshan.experimental.plots.plot_dxt_heatmap.get_x_axis_tick_labels(max_time: float, n_xlabels: int = 4) npt.NDArray[np.float64] | npt.NDArray[np.intc] [source]
Creates the x-axis tick mark labels.
Parameters
max_time: the maximum time to plot.
n_xlabels: the number of x-axis tick marks to create. Default is 4.
Returns
x_ticklabels: array of x-axis tick mark labels of length
n_xlabels
.
- darshan.experimental.plots.plot_dxt_heatmap.get_x_axis_ticks(bin_max: float, n_xlabels: int = 4) npt.NDArray[np.float64] [source]
Creates the x-axis tick mark locations.
Parameters
bin_max: the maximum number of bins.
n_xlabels: the number of x-axis tick marks to create. Default is 4.
Returns
Array of x-axis tick mark locations of length
n_xlabels
.
- darshan.experimental.plots.plot_dxt_heatmap.get_y_axis_tick_labels(ax: Any, n_ylabels: int = 6) npt.NDArray[np.intc] [source]
Sets the y-axis tick mark labels.
Parameters
ax: a
matplotlib
axis object.n_ylabels: The number of y-axis tick mark labels to create. Default is 6.
Returns
y_ticklabels: array of y-axis tick mark labels of length
n_ylabels
.
- darshan.experimental.plots.plot_dxt_heatmap.get_y_axis_ticks(ax: Any, n_ylabels: int = 6) npt.NDArray[np.float64] [source]
Creates the y-axis tick mark locations.
Parameters
ax: a
matplotlib
axis object.n_ylabels: The number of y-axis tick mark labels to create. Default is 6.
Returns
yticks: array of y-axis tick mark locations of length
n_ylabels
.
- darshan.experimental.plots.plot_dxt_heatmap.get_yticklabels(ax: Any) List[str] [source]
Utility function for
get_y_axis_tick_labels
that retrieves the y-axis tick mark labels from the input axis.Parameters
ax: a
matplotlib
axis object.Returns
y_ticklabels: list of y-axis tick mark labels of length
n_ylabels
.
- darshan.experimental.plots.plot_dxt_heatmap.plot_heatmap(report: DarshanReport, mod: str = 'DXT_POSIX', ops: Sequence[str] = ['read', 'write'], xbins: int = 200, submodule: str | None = None) Any [source]
Creates a heatmap with marginal bar graphs and colorbar.
Parameters
report: a
darshan.DarshanReport
.mod: the DXT module to do analysis for (i.e. “DXT_POSIX” or “DXT_MPIIO”). Default is
"DXT_POSIX"
.ops: a sequence of keys designating which Darshan operations to use for data aggregation. Default is
["read", "write"]
.- xbins: the number of x-axis bins to create; it has
no effect when mod is HEATMAP
- submodule: when mod is HEATMAP this specifies the
source of the runtime heatmap data, otherwise it has no effect
Returns
jgrid: a
sns.axisgrid.JointGrid
object containing a heat map of IO data, marginal bar graphs, and a colorbar.Raises
NotImplementedError: if a DXT module is not input (i.e. “DXT_POSIX”).
ValueError: if the input module is not in the
DarshanReport
.
- darshan.experimental.plots.plot_dxt_heatmap.remove_marginal_graph_ticks_and_labels(marg_x: Any, marg_y: Any)[source]
Removes the frame, tick marks, and tick mark labels for the marginal bar graphs.
Parameters
marg_x : a x-axis marginal bar graph object.
marg_y : a y-axis marginal bar graph object.
- darshan.experimental.plots.plot_dxt_heatmap.set_x_axis_ticks_and_labels(jointgrid: Any, tmax: float, bin_max: float, n_xlabels: int = 4)[source]
Sets the x-axis tick mark locations and labels.
Parameters
jointgrid: a
sns.axisgrid.JointGrid
object.tmax: the maximum time to plot.
bin_max: the maximum number of bins.
n_xlabels: the number of x-axis tick marks to create. Default is 4.
darshan.experimental.plots.plot_dxt_heatmap2 module
- darshan.experimental.plots.plot_dxt_heatmap2.plot_dxt_heatmap2(report, xbins=10, ybins=None, group_by='rank', mods=None, ops=None, display_values=False, cmap=None, figsize=None, ax=None, amplify=False)[source]
Generates a heatmap plot from a report with DXT traces.
- Parameters:
report (darshan.DarshanReport) – report to generate plot from
xbins (int) – number of bins on the x axis
ybins (int) – number of bins on the y axis
group_by (str) – attribute to group by (e.g., rank, hostname)
mods (list) – modules to include in heatmap (e.g., [‘DXT_POSIX’, ‘DXT_MPIIO’])
ops (list) – operations to consider (e.g., [‘read’, ‘write’]
display_values (bool) – show values per heatmap field
cmap – overwrite colormap (see matplotlib colormaps)
figsize – change figure size (see matplotlib figsize)
amplify (int) – paint neighbouring cells e.g., when working with many ranks
darshan.experimental.plots.plot_io_cost module
Module for creating the I/O cost bar graph for the Darshan job summary.
- darshan.experimental.plots.plot_io_cost.combine_hdf5_modules(df: Any) Any [source]
Combines the “H5F” and “H5D” rows in the input dataframe into a single entry under the “HDF5” title.
Parameters
df: a
pd.DataFrame
containing the average read, write, and meta times for various pydarshan modules (i.e. “POSIX”, “MPI-IO”, “STDIO”).Returns
Modified version of the input dataframe, where if either or both “H5F” and “H5D” modules are present, they have been renamed and/or summed under a new index “HDF5”, if available.
Notes
If a single HDF5-related module is present it will be renamed as “HDF5”. If no HDF5-related modules are present the dataframe will be unchanged.
- darshan.experimental.plots.plot_io_cost.get_by_avg_series(df: Any, mod_key: str, nprocs: int) Any [source]
Create the “by-average” series for the stacked bar graph in the I/O cost figure.
Parameters
df: the dataframe containing the relevant data, typically the “fcounter” data from a Darshan report.
mod_key: module to generate the I/O cost stacked bar graph for (i.e. “POSIX”, “MPI-IO”, “STDIO”).
nprocs: the number of MPI ranks used for the log of interest.
Returns
by_avg_series: a
pd.Series
containing the average read, write, meta, and wait times.
- darshan.experimental.plots.plot_io_cost.get_io_cost_df(report: DarshanReport) Any [source]
Generates the I/O cost dataframe which contains the raw data to plot the I/O cost stacked bar graph.
Parameters
report: a
darshan.DarshanReport
.Returns
io_cost_df: a
pd.DataFrame
containing the average read, write, and meta times.
- darshan.experimental.plots.plot_io_cost.plot_io_cost(report: DarshanReport) Any [source]
Creates a stacked bar graph illustrating the percentage of runtime spent in read, write, and metadata operations.
Parameters
report: a
darshan.DarshanReport
.Returns
io_cost_fig: a
matplotlib.pyplot.figure
object containing a stacked bar graph of the average read, write, and metadata times.
darshan.experimental.plots.plot_opcounts module
- darshan.experimental.plots.plot_opcounts.autolabel(ax, rects)[source]
Attach a text label above each bar in rects, displaying its height.
- darshan.experimental.plots.plot_opcounts.gather_count_data(report, mod)[source]
Collect the module counts and labels for the I/O Operation Count plot.
- darshan.experimental.plots.plot_opcounts.plot_opcounts(report, mod, ax=None)[source]
Generates a bar chart summary for operation counts.
Parameters
report (DarshanReport): darshan report object to plot
mod: the module to plot operation counts for (i.e. “POSIX”, “MPI-IO”, “STDIO”, “H5F”, “H5D”). If “H5D” is input the returned figure will contain both “H5F” and “H5D” module data.