CCM model physics comprises a number of computational modules (Section 2). How seriously a module affects load imbalance in the parallel model depends on how much imbalance there is in the module and how much time the module contributes to total time spent in physics. Table 2 shows the amount of time processors spent performing useful work in the major modules of CCM2 physics and how much time was lost to load imbalance. The ``useful'' time is the mean time spent over processors in the hypothetical 2-cell per processor 4096-processor decomposition. The time lost to imbalance is the time spent on the processor that took the longest time (over all modules) minus the mean.

An alternative way to compute this time would be to take the maximum for
a single module of the code and subtract the mean, to determine the
inefficiency for that module.
However, it is uncertain whether the
maximum in each module would occur on the same processor. Therefore,
although this method shows the absolute imbalance for a particular module,
it would be inappropriate to add together the
inefficiencies for different modules.
Since we are interested
in the *net* effect of imbalances, we used the former method of
calculation-considering the time for each module on the processor
with the maximum overall physics time. In practice, we discovered that
the overall difference between the two ways of calculating the
inefficiency is small: adding together times produced by the alternative
calculation generates an average physics time step of 1938
milliseconds, which is only 3 percent above the net time of 1881
milliseconds. This suggests there is little canceling out of
imbalances in the physics because the imbalance from the diurnal cycle
in the radiation module (RADCTL) dominates the rest of the profile.

The times shown are for 1 type-A step (solar radiation with absorptivity and emissivity calculations), 11 type-B steps (radiation), and 24 type-C steps (nonradiation). For the representative period of 36 time steps (one-half of a simulation day) our hypothetical 4096-processor decomposition of model physics consumes 1881 milliseconds, only 1267 milliseconds of which is spent in useful computation. The difference, 614 milliseconds (33 percent), is lost to idle time.

Wed Dec 7 03:37:14 CST 1994