- Changes 6/26/2004
buschelm - Added logging of calls to MPI_Iprobe and MPI_Probe
buschelm - Added logging of calls to MPI_Comm_dup and MPI_Comm_split with
additional logging when comm used is MPI_IDENT to MPI_COMM_WORLD and
buschelm - Improved output for synchronizing routines.
WDG - Introduced msgsize_t to allow the use of 64 bit integers
for message lengths when available.
WDG - Added MPI_Pcontrol hook. Fixed some collective routines
so that only data *sent* is recorded (more uniform, since
amount sent and amount received is different. Sent defined
as "provided", e.g., bcast sends the data once from the root.
Added explanatory header, and added more controls over output
format. Added synctime control.
WDG - Many changes. Fixed broken table output. Restored
ability to output in multiple output formats (not just XML),
though only text currently available
Changed to structures for the data collection to reduce
overhead and simplify code.
Split a lot of long code into separate routines; moved global
declarations to the group of routines that use them. We may
want to subdivide the file at some point.
WDG - Many updates, including new configure and fpmpiconf.h file
Added nearest-neighbor collection info. Each proc will
record the number of times it communicates with others as
well as keep a running sum of the size of messages it
sends to others.
This is rather kludgey at the moment.
Added support for proposed XML schema
"It is assumed that each performance record must contain the
following (consistent with the second approach for performance
- Record type (identifies the record)
- Measurement container (consists of one or more measurements)
- Metadata (system and application attributes that provide the
context in which the performance data was collected)
- Timestamp (identify when the record was created)
Added gtable output
Send sizes were converted to doubles due to overflows.
Output format was changed to reflect needs of the Terascale
We now print the min/max/avg min_loc/max_loc for the communication
partners across all ranks, instead of printing the list of all
ranks which participated in a point-to-point communication.
In a 5-pt stencil, for instance, the avg. should be close to 5
with min/max values differing due to boundary effects.
The heading for this section (in MPI_PROFILE_FILE) has changed
to "Communication Partners".
Still to do: The Fortran interfaces for the MI_Waitxxx routines
are incorrect and need to be fixed.
- On Linux systems, the /proc/(pid)/stat file is read in order
to get memory usage info. Lines are notated with a
#ifdef LINUX directive.
- We now check for the T5_EVENT1 environment variable, since
if this is set, it generates an error (conflicting hwc error)
on SGI machines when using the hardware counters. If set, an error
message is printed to STDERR and executions halts.
If not set, we check for the presence of the MPI_PROFILE_EVENT1
environment variable. If set to 21, floating point ops are polled
from the hardware counters, and if set to 26, L2 cache misses
are polled. Other values are allowed since we don't check this
value. If MPI_PROFILE_EVENT1 is not set, L2 cache misses are
polled by default.
- The destination/sources of each point-to-point communication
call is tracked and printed out, under the heading
of "Processor Communication Data".
We just keep track of the ranks, independent of the communicators.
We *should* use MPI_Group_translate_ranks to make all ranks
relative to MPI_COMM_WORLD; in another realease...
- Totals for point-to-point sends:
total calls, size of messages, time. This includes
the usual min/max/avg min-loc/max-loc info as well.
Also, the communication rate (size/time) is calculated.
- Total for collective sends: (same as above)
- Total "barrier" time is shown for collective routines
(this is the time spent waiting for nodes to initiate
- added profiling for MPI_Waitxxx calls (C only)
- stats printed for Waits & Barrier calls.
- added more MPI routines (Sendrecv_replace, Pack/Unpack)