next up previous contents
Next: 4.4 Performance Analysis of Up: 4. Special Features Previous: 4.2 Tuning of the   Contents

4.3 Estimation of MPI Communication Overhead

Figure 4.4: Graphical MPI overhead profiling through the use of the BaseAlignedCumulativeExclusionRatio view of Timeline window in Figure 3.10.
Image timeline_mpi_overhead

Most MPI application developers want to know about the overhead of MPI calls in their programs. Essentially, they want to know what the communication overheadis in their parallel programs. New SLOG-2 viewer provides a graphical answer to this question for most MPI profiling systems. In MPE profiling systems, MPI states are alway nested deeper than the user-defined states. Therefore, disabling the user-defined states and arrows in the CumulativeExclusionRatio mode in the Timeline window still leaves all MPI exclusion ratios intact, without distorting the collective meaning of exclusion ratios. Figure 4.4 shows a CumulativeExclusionRatio view in BaseAligned mode that looks like a two-dimensional projection of a three-dimensional histogram for a timeline vs time coordinate system. The base aligned feature is for easy comparison of preview states' heights. From the figure, we know that the yellow state (i.e., MPI_Barrier) takes the most time in the program; we also know when and where MPI_Barrier consumes the most time. The combination of disabling user-defined states and using the BaseAlignedCumulativeExclusionRatio Timeline view provides a powerful and convenient way to estimate MPI communication overhead. Together with the zoomable capability of the Timeline window, the user can easily zoom in to identify the time and location of the bottleneck that causes the biggest communication overhead. For an overall estimate of MPI overhead, a Histogram window over the whole duration of the timeline canvas can be obtained, as shown in Figure 4.5. The empty region in each timeline is assumed to be for user computation.

Figure 4.5: Overall MPI overhead histogram for Figure 4.4.
Image histogram_mpi_overhead


next up previous contents
Next: 4.4 Performance Analysis of Up: 4. Special Features Previous: 4.2 Tuning of the   Contents