Over the course of 36 time steps for the hypothetical 2-cell per processor 4096-processor decomposition, time spent in physics is 62 milliseconds per time step. This time is the sum (over time steps) of the maximum time (over the grid of 2-cell partitions) at each time step, divided by the number of steps. The maximum need not occur at the same 2-cell partition at each time step. However, since there is a synchronization imposed by CCM dynamics between calls to physics on successive time steps, and because physics is called for all grid points before the onset of dynamics in a given time step, it is reasonable to consider the sum of the individual maxima at each time step as the time spent in physics for the series of time steps. By similar reasoning, one may sum the mean time over 2-cell partitions in the grid from each step, divide by the number of steps, and call this the average time spent per time step. This average or ``ideal'' time was 45 milliseconds per step and represents the time physics would have taken in a situation of perfect balance. Dividing this ideal time by the maximum time gives an efficiency of 72.6 percent for physics as a whole (or an inefficiency of 27.4 percent). The amount of time that would be lost to load imbalance in this decomposition is 17 milliseconds per step, the difference between the maximum and the mean.
The effect of physics inefficiency on total model performance depends on how efficiently the rest of the model, in particular dynamics, is performing. Dynamics in CCM is primarily communication bound, though there is also some computational inefficiency owning to an uneven distribution of Fourier coefficients between processors in spectral dynamics (for wind velocity and temperature) and a disproportionate amount of work at the poles in the semi-Lagrangian dynamics (for moisture). At present, in real runs of the code on the Intel Touchstone DELTA, physics consumes about a third of the total run time when running on the full machine (Table 4). Roughly speaking, for the current implementation of PCCM2 on the full DELTA, the effect of a 33 percent (Section 3.2) computational imbalance in physics will be around 10 percent. As communication efficiency improves with tuning of spectral and semi-Lagrangian dynamics, the effect of physics load imbalance in PCCM2 will become more pronounced.