Introduction

In 1991-1992, the Mathematics and Computer Science Division at Argonne National Laboratory began making plans for a change in its research focus in high-performance computing. Since 1983 it had carried out research in many areas of parallel computing, and had operated the Advanced Computing Research Facility, home to early versions of parallel computers from a variety of manufacturers. Work with these machines had made significant contributions to the understanding of parallel algorithms for many scientific problems and to the development of tools for expressing these algorithms portably. But it was time for a new direction. The new direction was to involve application scientists in the <#23#> use<#23#> of this parallel computing knowledge, in order to demonstrate the cost-effectiveness of parallel computing for large-scale scientific problems. Such a goal required a different machine acquisition strategy from that of previous years. Although some forward-looking application scientists had used the machines of the ACRF to acquaint themselves with parallel computing issues, most had kept to their traditional supercomputers, simply because the research machines of the ACRF, suitable as they were for computer science research, did not have the computing power necessary for delivering research results in the new field of computational science. The Argonne-based consortium ``auditioned'' all the then-current parallel computer vendors. They chose the IBM SP1, then just becoming available, configured in a larger-than-officially-available size (128 nodes) and with custom-designed I/O hardware. It was anticipated that the SP1 would provide an environment to which existing tools and applications could be ported quickly, providing early evidence of its usability as a first-class scientific instrument. This article is an account of the first few months of experience with the IBM SP1. The projects reported here consist of tools projects (See Table~#tabtools#24>, reflecting both the porting of existing tools to the SP1 and in some cases the development of new ones; and also application projects (See Table~#tabapps#25>) many of which use one or more of the tools.

#table26#
Table: Parallel tools described in this paper

#table37#
Table: Applications described in this paper

All timings and performance results in this document are preliminary. Because the IBM SP1 is running a full Unix on each node, it is more difficult than on MPPs that run a single-user operating system to insure that no processes other than the program being benchmarked are using resources. The performance figures given here were done without running on a stand-alone system, and reflect use of the the tb0 communication adaptors. Many of the results presented here are on relatively small numbers of processors; again, this is primarily because little time was made available for single-user benchmarks.