In 1991-1992, the Mathematics and Computer Science Division at Argonne
National Laboratory began making plans for a change in its research focus in
high-performance computing. Since 1983 it had carried out research in many
areas of parallel computing, and had operated the Advanced Computing Research
Facility, home to early versions of parallel computers from a variety of
manufacturers. Work with these machines had made significant contributions to the
understanding of parallel algorithms for many scientific problems and to the
development of tools for expressing these algorithms portably. But it was time
for a new direction. The new direction was to involve application scientists
in the <#23#> use<#23#> of this parallel computing knowledge, in order to
demonstrate the cost-effectiveness of parallel computing for large-scale
scientific problems.
Such a goal required a different machine acquisition strategy from that of
previous years. Although some forward-looking application scientists had used
the machines of the ACRF to acquaint themselves with parallel computing
issues, most had kept to their traditional supercomputers, simply because the
research machines of the ACRF, suitable as they were for computer science
research, did not have the computing power necessary for delivering research
results in the new field of computational science.
The Argonne-based consortium ``auditioned'' all the then-current parallel
computer vendors. They chose the IBM SP1, then just becoming available,
configured in a larger-than-officially-available size (128 nodes) and with
custom-designed I/O hardware. It was anticipated that the SP1 would provide
an environment to which existing tools and applications could be ported
quickly, providing early evidence of its usability as a first-class
scientific instrument.
This article is an account of the first few months of experience with the IBM
SP1. The projects reported here consist of tools projects (See
Table~#tabtools#24>, reflecting both the porting of existing tools to the
SP1 and in some cases the development of new ones; and also application
projects (See Table~#tabapps#25>) many of which use one or more of the tools.
#table26#
Table: Parallel tools described in this paper
#table37#
Table: Applications described in this paper
All timings and performance results in this document are preliminary.
Because the IBM SP1 is running a full Unix on each node, it is more difficult
than on MPPs that run a single-user operating system to insure that no
processes other than the program being benchmarked are using resources.
The performance figures given here were done without running on a
stand-alone system, and reflect use of the the tb0 communication adaptors.
Many of the results presented here are on relatively small numbers of
processors; again, this is primarily because little time was made
available for single-user benchmarks.