"Clever name": a collection of I/O tracing utilities
While there are several I/O tracing libraries out there (MPE, TAU, PABLO, etc),
none of them quite fit my needs.
Purpose
I wanted to know, at the MPI-IO layer, which processes write how much data to
what offset.
I didn't know I wanted to know that until I saw John Bauer in IBM's Deep
Computing group (bauerj@us.ibm.com) present his DataView visualizer for his
"Modular I/O" (MIO) infrastructure at ScicomP 14
A few weeks later Rob Ross showed me some slides he presented at Scidac 2008.
Yuan Hong and Han-Wei Shen produced a graphic to show a 2D row-major
representation of a dataset. (The official citation is Y. Hong and H.-W.
Shen, "Histogram-based visibility culling in visualizing large volume data",
OSU Technical Report OSU-CISRC-7/08-TR38, 2008). This representation seemed
like a good way to visualize a problem I was having at the time.
With these motivating examples in mind, I wrote my own versions with lots
of feedback from Rob Ross, Tom Peterka and Han-Wei Shen.
File format
I created yet another log file format for this: it's a series of plain ASCII
lines, one per I/O operation.
rank op offset length start_time end_time
- rank: MPI rank of process
- op: I/O operation. Either 'r' for read or 'w' for write.
- offset: starting offset of operation
- length: amount of data, in bytes, in operation
- start_time: the MPI_Wtime marking the beginning of the operation
- end_time: the MPI_Wtime marking the end of the operation
This format ends up being pretty easy to work with, but isn't very scalable.
You can imagine how 100k 'fprintf' calls could end up not scaling very well.
If you have some other sort of log format, it should be pretty easy to convert.
Tools
- svgplot.pl: the first tool I wrote. Plots file accesses in 2-D, showing
the locality and size of each I/O operations.
- time-offset.pl: plots file accesses over time.
- time-offset-svg.pl: also plots file accesses over time, but outputs SVG. A
good choice if you are looking for presentation fodder. To be fair, RobR
actually wrote this one and figured out the clever inkscape-specific SVG
metadata.
Code
Here are our scripts and utilities to make the graphics:
ioplots-20090304.tar.gz
While there might be other ways to generate the log data, I hacked fprintf
calls into ROMIO. Here's the patch:
romio-logging-patch.diff
Screenshots
Here's an example of a program that reads data out of a netcdf file:
Here's an example of a program that does independent writes to an HDF5 file:
This one gets a little crazy: we associate colors to the ranks. The svg
version of this produces labels on the x and y axes so that helps make it
easier to understand.
Last modified:
Wed Mar 4 16:13:42 CST 2009