MPMM Performance Data, Summer 1995

The following MPMM performance data was collected July and August on the NAS SP2 and on the Cray T3D at NERSC.

The performance data shown are for a single domain (no nest) problem, 100km resolution, non-hydrostatic, full-physics (radiation, Grell cumulus, explicit moisture, mixed-phase ice physics, Blackadar PBL). Grid size is 61 by 61 by 23 levels.

Speedup

This graph shows relative speedup on both the IBM SP and the Cray T3D from 4 through 128 processors. Ideal speedup, wherein the increase in speed is exactly proportional to the number of processors, is plotted for comparison. Parallel efficiency -- observed speedup divided by ideal speedup -- is 79 percent on the T3D and 63 percent on the SP2.

The extreemly low latency and high-bandwidth of the T3D interprocessor communication hardware and software no doubt plays a role in its greater efficiency relative to the SP2. Of greater impact, however, is the T3D's dramatically lower per-processor performance relative to the SP2. Thus, although the T3D looks attractive in terms of parallel efficiency and speedup, the SP2 is 3.5 times faster. This is shown in the next section.

Performance

When timings are adjusted to discount inefficiency from parallelism, per-node performance of the SP2 is 30.6 Mflop/sec. Each T3D node generates 7.3 Mflop/sec. I suspect that the unusually small size of the primary cache on each T3D node is a factor. One notes that the model is running with 32-bit precision on the SP2 while the T3D code is executing at 64-bit precision. However 64-bit precision is not strictly necessary for this code, and is used on the T3D only because that is all that is available. Also, 64-bit per-processor computational rates are generally slightly higher, not lower, than 32-bit rates on RS/6000 platforms since the processor computes internally at the higher precision.

Return to MPMM index page.