The MPI profiling interface allows the convenient construction of portable tools that rely on intercepting calls to the MPI library. Such tools are ``ultra portable'' in the sense that they can be used with any MPI implementation, not just a specific portable MPI implementation.
The MPI specification makes it possible, but not particularly convenient, for users to build their own ``profiling libraries,'' which intercept all MPI library calls. MPICH comes with three profiling libraries already constructed; we have found them useful in debugging and in performance analysis.
...
[1] Starting MPI_Bcast...
[0] Starting MPI_Bcast...
[0] Ending MPI_Bcast
[2] Starting MPI_Bcast...
[2] Ending MPI_Bcast
[1] Ending MPI_Bcast
...
One of the most useful tools for understanding parallel program behavior is a graphical display of parallel timelines with colored bars to indicate the state of each process at any given time. A number of tools developed by various groups do this. One of the earliest of these was upshot [33]. Since then upshot has been reimplemented in Tcl/Tk, and this version [34] is distributed with MPICH. It can read log files generated either by Paragraph [32] or by the mpe logging routines, which are in turn used by the logging profiling library. A sample screen dump is shown in Figure 9 .

Figure 9: Upshot output
The most obvious way to use the profiling library is to choose some family of calls to intercept, and then treat each of them in a special way. Typically, one performs some action (adds to a counter, prints a message, writes a log record), calls the ``real'' MPI function using its alternate name PMPI_Xxxx, perhaps performs another action (e.g., writes another log record), and then returns to the application, propagating the return code from the PMPI routine.
MPICH includes a utility called wrappergen that lets a user specify ``templates'' for profiling routines and a list of routines to create, and then automatically creates the profiling versions of the specified routines. Thus the work required by a user to add a new profiling library is reduced to writing individual MPI_Init and MPI_Finalize routines and one template routine. The libraries described above in Section Profiling Libraries are all produced in this way. Details of how to use wrappergen can be found in [27].