- ...microseconds
- On the DELTA, the system timing
function
used was HWCLOCK.
- ...decomposition,
- While the contribution
to load from
each cell is a fixed quantity,
the load imbalance
that results depends on the decomposition,
how the cells are allocated to processors.
The size and shape of
partitions affect load imbalance.
(Take the trivial case of all cells grouped onto one processor in the
shape of the grid itself: the inefficiency due to load imbalance
is zero.) Therefore, the timing and efficiency numbers quoted in this
discussion are specific to the hypothetical decomposition in force.
- ...step
- This was not true in the vector/shared-memory
parallel version of the
model, CCM2. The call to physics for each latitude was followed by
the call to the FFT for that latitude. To efficiently block FFT
communications in the parallel code, PCCM2 separated the calls to
physics and the calls to the FFT into separate loops over latitude.
Each time step,
physics for all latitudes is complete before the synchronization imposed
by message passing in the spectral dynamics.