Up: Producing Logfiles
Next: The MPI Profiling Interface
Previous: Producing Logfiles
A logfile-creation library for parallel programs has a number of
requirements. Meeting these requirements is eased by assuming that MPI is
available, and we do make that assumption.
- Logging must be so efficient that it does not materially affect the
behavior of the program. This means that I/O should be carried out only
after the program finishes.
- It is convenient if at the end of the program there is only one logfile,
rather than one for each process.
- Timestamps cannot, in general, be assumed to be synchronized among
processes. On some machines this assumption can be made, such as on certain
SMPs or if the switch clock on the IBM SP is being used, but in general some
postprocessing is needed to synchronize the timestamps.
- The data in the logfile should be self-describing to some extent, for
the convenience of the display tool.
CLOG makes a number of compromises in meeting these requirements.
- To obtain a timestamp, CLOG calls MPI_Wtime, which returns a
floating-point number of seconds since some time in the past. The
assumption is that MPI_Wtime is reasonably efficient on any MPI
implementation, although this particular format may not be the most
efficient way to get a timestamp on a given machine. We choose this
approach for portability.
- When a log record is to be written, a subset of its content is stored in
a relatively large buffer in memory. When this buffer fills up, another one
is malloc'd from the system. An alternative would be to write the full
buffer to disk and reuse the space, but the I/O might perturb the
computation. Of course calling malloc also causes some perturbations
in the computation, but not as much as I/O would. We are planning to make
this mechanism scalable to larger log files by periodically dumping to disk.
- At the end of the computation, the buffers are processed to add
information that is the same for all records in the buffer (process id, for
example). At this time the timestamps are adjusted. A relatively
straightforward algorithm is used to find relative clock offsets. Process 0
requests a local time value from each of the other processes, and assumes
that other processes read their own clocks midway between the sending of
this request and receiving the answer. The request is repeated several
times, and the result with lowest error (shortest round-trip time)
is used. More
accurate but more complicated algorithms are known; our algorithm
corrects only for displacement, not for dilation, but so far it has proved
adequate. As a last resort, Jumpshot itself has a method for fine
tuning the displacement of the events in a given process at display time by
dragging an individual time line.
At the end of the run, all processes participate in merging the records by
timestamp to create a single logfile. After the last log record is written
and postprocessing of local buffers is complete, each process has a linked
list of buffers, each containing a large number of log records. The processes
form themselves into a binary tree, with (MPI) Process 0 at the root. The
processes at the leaves begin by sending their buffers to their parents. Each
nonleaf process performs a three-way merge of its own buffer with the buffers
arriving from its children. When a merged buffer has been filled, it is sent
to its parent. At the root, merged, filled buffers are written to the
logfile. MPI is used for all communication. The file is in MPI's
``external-32'' portable format [7], which is the same format used by Java for
portability of files, and which makes it portable, but requires byte-swapping
on some machines for some fields in the log records.
The post mortem processing, although it makes collection of the logfiles less
intrusive, makes jumpshot less useful for debugging, since the logfile is
only fully assembled when the program terminates normally. This underscores
the fact that Jumpshot is a performance debugging tool rather than a
correctness debugging tool.
Up: Producing Logfiles
Next: The MPI Profiling Interface
Previous: Producing Logfiles