Over the course of 36 time steps for the hypothetical 2-cell per
processor 4096-processor decomposition,
time spent in physics is 62
milliseconds per time step. This time is the sum (over time steps) of
the maximum time (over the grid of 2-cell partitions) at each
time step, divided by the number of steps.
The maximum need not occur at the same 2-cell partition
at each time step.
However, since there is a synchronization imposed by CCM dynamics
between calls to physics on successive time steps, and because physics
is called for all grid points before the onset of dynamics in a given
time step
,
it is reasonable to consider the sum of the individual maxima at each
time step as the time spent in physics for the series of time steps.
By similar reasoning, one may sum the mean time
over 2-cell partitions in the grid from each step, divide
by the number of steps, and call
this the average time spent per time step. This average or
``ideal'' time was 45 milliseconds per step and represents
the time physics would have taken in a situation of
perfect balance. Dividing this ideal time by the maximum time gives an
efficiency of 72.6 percent for physics as a whole (or an inefficiency of 27.4 percent). The amount of time that would be lost
to load imbalance in this decomposition is 17 milliseconds per step,
the difference between the maximum and the mean.
The effect of physics inefficiency on total model performance depends on how efficiently the rest of the model, in particular dynamics, is performing. Dynamics in CCM is primarily communication bound, though there is also some computational inefficiency owning to an uneven distribution of Fourier coefficients between processors in spectral dynamics (for wind velocity and temperature) and a disproportionate amount of work at the poles in the semi-Lagrangian dynamics (for moisture). At present, in real runs of the code on the Intel Touchstone DELTA, physics consumes about a third of the total run time when running on the full machine (Table 4). Roughly speaking, for the current implementation of PCCM2 on the full DELTA, the effect of a 33 percent (Section 3.2) computational imbalance in physics will be around 10 percent. As communication efficiency improves with tuning of spectral and semi-Lagrangian dynamics, the effect of physics load imbalance in PCCM2 will become more pronounced.