History of MCS Division High-Performance Computing

Advanced Computing Research Facility: 1984-1992

The Advanced Computing Research Facility (ACRF) was established in 1984 in recognition of the role that parallel computers would play in the future of scientific computing. The mission of the ACRF was to encourage experimentation on computers with innovative designs, to assess the suitability of diverse machines for specific applications, and to operate as a national user facility. The ACRF quickly developed ties with computer manufacturers at the forefront of computer technology. The first machine to be acquired was a Denelcor HEP, at that time the only multiple-instruction stream, multiple-data stream (MIMD) computer available commercially. From that modest beginning, the ACRF expanded to include nine multiprocessors:

Thinking Machines CM-2, with 16,384 one-bit processors and a total of 128 megabytes of memory.

Active Memory Technology DAP-510, with 1,024 one-bit processors having 8 kilobytes of memory per processor.

BBN TC 2000 (Butterfly II), with 32 processors and 128 megabytes of memory.

Intel iPSC/d5 hypercube, with 32 nodes each having 512 kilobytes of memory.

Sequent Balance 21000, with 24 processors sharing 24 megabytes of memory.

Encore Multimax, with 20 processors sharing 64 megabytes of memory.

Intel iPSC/d4 hypercube, with 16 vector processors each having 1.5 megabytes of memory.

Alliant FX/8, with 8 vector processors sharing 64 megabytes of memory.

Ardent Titan graphics supercomputer, with 4 vector processors and 64 megabytes of memory.

These experimental machines were linked to each other and to a cluster of 50 workstations. The ACRF computers also were accessible through local and long-distance telephone and national networks.

Research and Development Using the diverse computers in the ACRF, investigators developed new techniques to exploit parallelism in scientific research. Emphasis was given to the interaction between algorithms, the software environment, and advanced computer architectures; the portability of programs without a sacrifice in performance; and the design of parallel programming tools and languages.

Programs As a national user facility, the ACRF staff encouraged research throughout the world to use the advanced computers for experimentation. During a typical year of operation, more than one hundred projects were approved by the ACRF staff.

Also established at the ACRF were two affiliates programs: one for industry (coordinated by the ARCH Development Corporation) to provide a mechanism for exchanging ideas, testing new techniques, and transferring results in advanced computing between the research laboratory and industry; and the other for universities interested expanding their knowledge of parallel computing and in using the ACRF multiprocessing systems. Several universities also coordinated classroom courses with the ACRF staff, using the advanced computers to provide students with hands-on experience on a wide variety of multiprocessors.

The ACRF machines also were used for several two-week Summer Institutes in Parallel Computing, sponsored in part by the National Science Foundation and by the Department of Energy to train graduate students and postdoctoral researchers in advanced computing techniques.

High-Performance Computing Research Center: 1992-1997

In 1992, at DOE's request, Argonne changed the focus of the ACRF from experimental parallel machines to experimental production machines. The newly renamed facility, called the High-Performance Computing Research Center, emphasized collaborative research with computational scientists.

In April 1993, funding was authorized for the acquisition of a 128-node IBM POWERparallel System-1. By September, the acquisition was complete and installation begun.

Why an SP? The IBM POWERparallel System was the first scalable, parallel system to offer multiple levels of I/O capability. Each node had substantial disk space and memory, providing local physical storage memory commensurate with the processing power of the node, thus allowing large problems and problem segments to run on individual nodes. The internode network architecture provided relatively high bandwidth as well as good behavior with irregular communication patterns, permitting the efficient use of advanced algorithms that adapt to the structure of the problem. Additionally, the system included a large, high-performance I/O architecture containing a significant number of connections to the external I/O network, thus providing a very high aggregate bandwidth to secondary storage and other external devices and network connections.

Phased Installation

The SP1 was installed in stages, beginning in March 1993 with a beta-test version (32 nodes). Argonne staff worked with IBM to improve the design of this early system. For example, one potentially time-consuming step, the ``boot process'' for the nodes, was of particular concern to IBM. Argonne identified a better method that dramatically reduced the time from approximately 2 hours to 15 minutes. Argonne staff was also helpful in identifying the best arrangement for the console machines and in clarifying the requirements for AIX (IBM's version of Unix) in a parallel environment.

In August 1993, the next phase of the SP1, a 64-node system, was installed. This was IBM's first 64-way machine to be implemented outside of its own research center. Consequently, the Argonne implementation was regarded as an important test case for IBM. Argonne staff helped validate alternative configurations to determine which was most effective.

In September, Argonne once again expanded the SP1, this time to 128 nodes. Since this was the only existing 128-node Scalable POWERparallel System, IBM staff used the Argonne installation as their own testbed. We ran multiple tests of the software to ensure that the machine is running not only correctly but efficiently. nitial tests of the machine were run without the high-speed switch. Researchers ported various programming tools and libraries (such as Fortran M, p4, and Chameleon) as well as applications programs in diverse areas including climate modeling, automatic differentiation, and automated reasoning. These early experiments demonstrated the ease with which programs can be ported, with little change required.

Tne next step was to install the high-speed switch. This switch had a bandwidth four times greater than the previous switch and thus allowed for extremely rapid communication among processor nodes. The new switch was particularly important for problems with irregular or unpredictable communications patterns among nodes, such as are encountered in global climate simulations. In the early part of 1995 we modified the system configuration again by incorporating eight I/O server nodes ``inboard'' to directly attach to the high-speed switch. This upgrade substantially enhanced the SP I/O capabilities, a recognized bottleneck in most massively parallel systems. The enhanced I/O system had the added advantage that programs that do not need high I/O transfers (e.g., Monte Carlo nuclear physics investiga tions) can specify other processor nodes, leaving the I/O nodes free for I/O-intensive work (e.g., in climate modeling).

Research

In October 1993, a CRADA was signed between IBM and Argonne to explore tools, software, and I/O systems for ensuring that the full potential of the SP1 was realized.

To encourage widespread use of the SP, we held a workshop in March 1994. This two-day meeting was aimed principally at potential non-Argonne users; eighty representatives from universities and industry attended the meeting. A repeated theme of the applications programmers was the need for high-performance parallel I/O systems. We therefore began exploring various approaches to using the SP I/O system. The SP also was used to examine fundamental issues of code reuse and data management in conjunction with a number of software tools projects, including MPI. These efforts took advantage of the specialized capabilities of the SP operating system to optimize communication performance.

In 1994, in a joint effort with IBM and supported in part by a Laboratory Directed Research and Development grant, we began integrating the SP with multimedia capabilities. By enabling systems such as the IBM SP to process voice and image data along with scientific data resulting from computations, we hope not only to expand the types of application suitable for parallel computing (e.g., digital libraries) but also to facilitate the use of modern interface technology for those using the SP system for scientific applications (e.g., coupling animation and voice annotation with scientific data sets).

In 1995, we held a half-day workshop on the SP. The workshop featured an overview of the SP, presentations on the tools that have been developed (including the new job scheduler developed at Argonne), and libraries that are available (including MPI, Fortran M, and PETSc). Among the diverse applications discussed were nuclear-physics Monte Carlo calculations, task-farming for multidimensional parameter space searches, simulating enzyme reactions, quantum chemistry applications, numerical weather prediction, and climate modeling.

In 1996, we held a briefing workshop for IBM executives on high-performance computing and current research projects in the HPCRC.

Center for Computational Science and Technology: 1997-1999

In 1997, Argonne's supercomputing center was recognized by DOE as one of the nation's four high-end resource providers. Its new mission, as a Center for Computational Science and Technology (CCST), was to (1) operate the most advanced large-scale highly parallel, scalable computing systems, (2) develop software tools for improving the utility of the systems, and (3) provide access to users from industry, national laboratories, and universities.

At the core of the CCST was the Quad, comprising four main components:

parallel compute engine (144-node IBM SP) providing the core compute power for the underlying computational science problems;

a high-capacity, high-bandwidth storage server;

a graphics rendering and geometry server (128-node Onyx 2 REality Monster with 12 Infinite Reality Engines); and

a digital media server for handling multiple simultaneous interactive media data streams

Friendly user testing began in early March 1997, with production mode achieved shortly thereafter. The CCST at that point became a major resource for several Grand Challenge Applications.

Advanced Computing Facilities and Chiba City: 1999-

Following the successful completion of the Grand Challenge Applications, the CCST was reorganized. MCS continued to operate two machines: the Origin2000 and the IBM SP.

The 128-CPU SGI Origin2000 computer served two purposes. First, with 12 Infinite Reality engines, it is used as the primary driver for the CAVE, several I-Desks, and other high-end visualization systems in the division. Second, it is used by computer scientists and computational scientists for large shared-memory experiments. In mid-2001, use of the Origin2000 was discontinued.

The 100-node IBM SP system continues to be used reliably, day-in and day-out, for production computing and computer science.

In October 1999, MCS reached a new level of supporting scalable computer science with the installation of Chiba City, a 256-node Linux cluster. Chiba City's theoretical peak performance of 256 gigaflops. The cluster consists of 512 Pentium-III 550 CPUs for computation, 32 Pentium-III 550 CPUs for visualization, 8 Xeon Systems for storage, 8.4 TB of disk, 153 GB of memory, Myrinet for high-performance interconnect, and Switched 100baseT ethernet for the management interconnect.

Chiba City was built specifically to support the MCS mission of investigating scalable computer science. The cluster is built entirely from open source software and will be available to researchers who are developing open source software and need to test the scalability of their code and algorithms.