CCM model physics comprises a number of computational modules (Section 2). How seriously a module affects load imbalance in the parallel model depends on how much imbalance there is in the module and how much time the module contributes to total time spent in physics. Table 2 shows the amount of time processors spent performing useful work in the major modules of CCM2 physics and how much time was lost to load imbalance. The ``useful'' time is the mean time spent over processors in the hypothetical 2-cell per processor 4096-processor decomposition. The time lost to imbalance is the time spent on the processor that took the longest time (over all modules) minus the mean.
An alternative way to compute this time would be to take the maximum for a single module of the code and subtract the mean, to determine the inefficiency for that module. However, it is uncertain whether the maximum in each module would occur on the same processor. Therefore, although this method shows the absolute imbalance for a particular module, it would be inappropriate to add together the inefficiencies for different modules. Since we are interested in the net effect of imbalances, we used the former method of calculation-considering the time for each module on the processor with the maximum overall physics time. In practice, we discovered that the overall difference between the two ways of calculating the inefficiency is small: adding together times produced by the alternative calculation generates an average physics time step of 1938 milliseconds, which is only 3 percent above the net time of 1881 milliseconds. This suggests there is little canceling out of imbalances in the physics because the imbalance from the diurnal cycle in the radiation module (RADCTL) dominates the rest of the profile.
The times shown are for 1 type-A step (solar radiation with absorptivity and emissivity calculations), 11 type-B steps (radiation), and 24 type-C steps (nonradiation). For the representative period of 36 time steps (one-half of a simulation day) our hypothetical 4096-processor decomposition of model physics consumes 1881 milliseconds, only 1267 milliseconds of which is spent in useful computation. The difference, 614 milliseconds (33 percent), is lost to idle time.