The most serious source of load imbalance in PCCM2 physics is the radiation package, specifically, shortwave radiation. Radiation comprises 68 percent (864.7/1267.1) of total physics computation over a representative 36-step period. This would be worse except radiation is performed only every third time step (hourly) and the principal component of longwave radiation, RADABS, is so costly that it is performed only every 36th step. The contributions of longwave (RADCLW) and shortwave (RADCSW) to overall radiation (RADCTL) costs is shown in Table 3. Longwave radiation, though costly, is nearly perfectly balanced so its effect on parallel efficiency is negligible. The source of all imbalance in radiation is the shortwave radiation package, RADCSW, because it is computed only in half the grid points (the ones in daylight) at any given time. Figure 2 shows time spent in RADCSW over the grid during a radiation time step. Only some 0.3 milliseconds of work is occurring in each 2-cell partition in the nighttime region, compared with 78 milliseconds of work in a daylight 2-cell partition. Within RADCSW, the sources of imbalance are computation within RADCSW itself and in three subroutines to compute surface albedo (RADALB), the delta-Eddington solar scheme (RADDED), and the clear-sky solar computation (RADCLR) (Table 5).
Of the 614 milliseconds lost to load imbalance each 36 time steps, the imbalance in shortwave radiation accounts for 476 milliseconds, or 77.5 percent of the total physics imbalance. For a model run in which physics was 36 percent of the total cost, imbalance in RADCSW would be responsible for 8.5 percent of the total inefficiency attributable to physics load imbalance.
The regularity of this pattern of imbalance suggested a straightforward scheme for correcting a large percentage of the shortwave radiation load imbalance. Before shortwave radiation is invoked in a time step, every other point in a latitude (an east-west row of points) is exchanged between processors, decomposing that row in such a way that, after the exchange, each processor has almost the same number of day and night points. After shortwave radiation, the exchange is reversed. In spite of the cost of performing the exchanges, the load-balancing code resulted in a 6 percent overall improvement in model run times .
One expects that the effectiveness of the exchange scheme for correcting diurnal cycle imbalance will vary seasonally because the balancing effect is in the east/west dimension only. North/south imbalances associated with seasonal variation in solar declination are not accounted for in the exchange scheme. Thus, the scheme should do well closest to the equinoxes in the simulation when all the latitudes have the same number of daytime and nighttime points. It should do most poorly closest to the solstices, when most latitudes will have different numbers of daytime and nighttime points. However, in the special case of PCCM2, the seasonally induced north/south imbalances in shortwave radiation are not a problem because the model latitudes are decomposed symmetrically about the equator: a processor handling the latitude at 30 N would also be handling 30 S. The domain happens to be decomposed this way to exploit symmetry in the spectral domain. Thus, the lower computations in one hemisphere are offset by higher computations in the corresponding region of the other hemisphere.