1998 Abstracts of MCS Reports and Preprints |
|
Reports |
|
| M. Hereld, B. Nickless, and R. Stevens, "Implementation of a Distributed Multiterabyte Storage System," ANL/MCS-TM-235, October 1998. | The University of Chicago and Argonne National Laboratory are building a highly distributed storage system to meet the needs of a cross-disciplinary group of scientists and to investigate the issues involved in implementing such a system. The storage system is based on IBM 3590/3494 tape technology managed by ADSM software. High-speed wide-area networking is required to support the distributed user community. NSF, DFS, FTP, and other user access methods are supported. A simulation project undertaken to guide system development has provided useful insights even in the calibration and design phases. |
| C. H. Bischof, P. Eberhard, and P. D. Hovland, eds., "Second Argonne Theory Institute on Differentiation of Computational Approximations to Functions," Technical Memorandum ANL/MCS-TM-236, June 1998. | A Theory Institute was held at Argonne National Laboratory on May 18-20, 1998. It brought together 38 researchers from the U.S., Great Britain, France, and Germany. Mathematicians, computer scientists, physicists, and engineers from diverse disciplines discussed advances in automatic differentiation (AD) theory and software and described benefits from applying AD methods in application areas. These areas include fluid mechanics, structural engineering, optimization, meteorology, and computational mathematics for the solution of ordinary differential equations (ODEs) or differential algebraic equations (DAEs). |
Preprints |
|
| J. Michalakes, "Same-Source Parallel Implementation of the PSU/NCAR MM5," Preprint ANL/MCS-P702-1297, March 1998. | We describe an IBM-funded project to develop a same-source parallel implementation of the PSU/NCAR MM5 using FLIC, the Fortran Loop and Index Converter. The resulting source is nearly line-for-line identical with the original source code. The result is an efficient distributed memory parallel option to MM5 that can be seamlessly integrated into the official version. |
| J. Czyzyk, T. Wisniewski, and S. J. Wright, "Optimization Case Studies in the NEOS Guide," SIAM Review 41(1) (1999), pp. 148-163. Also Preprint ANL/MCS-P704-0198, revised July 1998. | The point of applied mathematics is that the theoretical and algorithmic developments at the core of the subject are relevant to important applications in the real world. In studying the subject, we learn the usefulness of abstracting individual problem characteristics to a mathematical level. The connection to applications motivates us to tackle many of the conceptual difficulties that arise in our study of the mathematics. |
| M. Anitescu and R. Serban, "A Sparse Superlinearly Convergent SQP with Applications to Two-dimensional Shape Optimization," Preprint ANL/MCS-P706-0198, Feb. 1998. | Discretization of optimal shape design problems leads to very large nonlinear optimization problems. For attaining maximum computational efficiency, a sequential quadratic programming (SQP) algorithm should achieve superlinear convergence while preserving sparsity and convexity of the resulting quadratic programs. Most classical SQP approaches violate at least one of the requirements. We show that, for a very large class of optimization problems, one can design SQP algorithms that satisfy all these three requirements. The improvements in computational efficiency are demonstrated for a cam design problem. |
| M. E. Papka, R. Stevens, and M. Szymanski, "Collaborative Virtual Reality Environments for Computational Science and Design," in Computer Aided Design of High Temperature Materials, A. Pechenik, R. K. Kalia, and P. Vashishta, eds., Oxford University Press, 1998. Also Preprint ANL/MCS-P707-0298, Feb. 98. | We are developing a networked, multi-user, virtual-reality-based collaborative environment coupled to one or more petaFLOPs computers, enabling the interactive simulation of 10^9 atom systems. The purpose of this work is to explore the requirements for this coupling. Through the design, development, and testing of such systems, we hope to gain knowledge that will allow computational scientists to discover and analyze their results more quickly and in a more intuitive manner. |
| M. C. Ferris, M. P. Mesnier, and J. J. More', "NEOS and CONDOR: Solving Optimization Problems over the Internet," Preprint ANL/MCS-P708-0398, March 1998. | We discuss the use of Condor, a distributed resource management system, as a provider of computational resources for NEOS, an environment for solving optimization problems over the Internet. We also describe how problems are submitted and processed by NEOS, and then scheduled and solved by Condor on available (idle) workstations. |
| P. Verlinden, D. M. Potts, and J. N. Lyness, "Error Expansions for Multidimensional Trapezoidal Rules with Sidi Transformations," Numerical Algorithms 16 (1997), pp. 321-347. Also Preprint ANL/MCS-P709-0298, June 1998. | In 1993, Sidi introduced a set of trigonometric transformations x = \psi(t) that improve the effectiveness of the one-dimensional trapezoidal quadrature rule for a finite interval. In this paper, we extend Sidi's approach to product multidimensional quadrature over [0,1]^N. We establish the Euler-Maclaurin expansion for this rule, both in the case of a regular integrand function f(x) and in the cases when f(x) has homogeneous singularities confined to vertices. |
| W. L. Wan, T. F. Chan, and B. Smith, "An Energy-Minimizing Interpolation for Robust Multigrid Methods," Preprint ANL/MCS-P710-0298, May 1998. | We propose a robust interpolation for multigrid based on the concepts of energy minimization and approximation. The formulation is general; it can be applied to any dimensions. The analysis for one dimension proves that the convergence rate of the resulting multigrid method is independent of the coefficient of the underlying PDE, in addition to being independent of the mesh size. We demonstrate numerically the effectiveness of the multigrid method in two dimensions by applying it to a discontinuous coefficient problem and an oscillatory coefficient problem. We also show using a one-dimensional Helmholtz problem that the energy minimization principle can be applied to solving elliptic problems that are not positive definite. |
| B. F. Smith, "The Transition of Numerical Software: From Nuts-and-Bolts to Abstraction," SIGNUM Newsletter, Jan. 1998. Also Preprint ANL/MCS-P712-0598, May 1998. | Traditionally, much of the numerical analysis community has focused on issues of error bounds, convergence rates, and time and space complexity of numerical algorithms, not on robust, reusable software implementations. Until recently, many of the computer science advances in software engineering rarely filtered into numerical codes. With the growing complexity of (1) parallel computers, (2) numerical algorithms, and (3) application physics that must be modeled, the community has finally begun to realize the need for new techniques to manage the complexity of the codes. Thus, the entire community has entered a learning curve; some groups are well advanced on the curve and develop and use sophisticated class libraries and frameworks, while others are still struggling with dynamic memory allocation and data structures. This article will discuss some of the issues involved in the transition from a Fortran 77 nuts-and-bolts approach to developing numerical code to a higher level, abstract, object-oriented methodology and how our PETSc development team is trying to ease that transition. In addition, it will discuss two limitations in the Fortran 90 syntax that make it difficult to take full advantage of abstraction when programming purely in Fortran 90. |
| D. Diachin, L. Freitag, D. Heath, J. Herzog, and W. Michels, "Interactive Simulation and Visualization of Massless, Massed, and Evaporating Particles," Preprint ANL/MCS-P713-0498, May 1998. For Figures 2-4, contact beumer@mcs.anl.gov. | Most software packages available for particle tracing focus on visualizing steady or unsteady vector fields by using massless particle trajectories. For many applications, however, the use of massed and evaporating particles would provide a model of physical processes that could be used in product testing or design. In this article we describe the TrackPack toolkit, which provides an integrating interface for computing massless, massed, and evaporating particle trajectories in steady flow. In all cases, we assume a noncoupled models and compute particle trajectories through an existing vector field by numerically integrating with forward Euler, fourth-order Runge-Kutta, or an analytic streamline calculation. The TrackPack software effort was motivated by an industrial application to model pollution control systems in industrial boilers. We briefly describe the project and the visualization environment, and we demonstrate the necessity for massed, evaporating models in this application. |
| T. R. Canfield, "Simulation and Visualization of Mechanical Systems in Immersive Virtual Environments," in Proc. Engineering Mechanics: A Force for the 21st Century, May 17-20, LaJolla, CA, 1998. Also Preprint ANL/.MCS-P714-0498, April 1998. | A prototype for doing real-time simulation of mechanical systems in immersive virtual environments has been developed to run in the CAVE and on the ImmersaDesk at Argonne National Laboratory. This system has three principal software components: a visualization component for rendering the model and providing a user interface, communications software, and mechanics simulation software. The system can display the three-dimensional objects in the CAVE and project various scalar fields onto the exterior surface of the objects during real-time execution. |
| M. Anitescu, F. A. Potra, and D. E. Stewart, "Time-Stepping for Three-Dimensional Rigid Body Dynamics," Preprint ANL/MCS-P716-0498, April 1998. | Traditional methods for simulating
rigid body dynamics involve determining the current contact arrangement (e.g., each
contact is either a "rolling" or "sliding" contact). This
approach is most clearly seen in the work of Pfeiffer and Glocker. However, there
has been a controversy about the status or rigid body problems in the area that do not
famous, if not the earliest example is due. Recently, a number of time-stepping
methods have been developed to overcome this difficulty. These time-stepping methods
use integrals of the forces over forces. Separate newest of these methods
are developed in terms of complementarity problems. Time-Stepping procedure
simulating rigid body dynamics with friction. Proof of the existence of solutions to
the continuous problem can be measure differential inclusions methods.
There are, however, a number of limitations with the methods. In this paper several
variants will be discussed and their essential properties proven. Parallel numerical software based on the message-passing model is enormously complicated. This paper introduces a set of techniques to manage the complexity, while maintaining high efficiency and ease of use. The PETSc 2.0 package uses object-oriented programming to conceal the details of the message passing, without concealing the parallelism, in a high-quality set of numerical software libraries. In fact, the programming used by PETSc is also the most appropriate for NUMA shared-memory machines, since they require the same careful attention to memory hierarchies as do distributed-memory machines. Thus, the concepts discussed are appropriate for all scalable computing systems. The PETSc libraries provide many of the data structures and numerical kernels required for the scalable solution of PDEs, offering performance portability. |
| R. Thakur, W. Gropp, and E. Lusk, "A Case for Using MPI's Derived Datatypes to Improve I/O Performance," to appear in Proceedings of SC98: High Performance Networking and Computing, Nov. 1998. Also Preprint ANL/MCS-P717-0598, June 1998 | MPI-IO, the I/O part of the MPI-2 standard, is a promising new interface for parallel I/O. A key feature of MPI-IO is that it allows users to access several noncontiguous pieces of data from a file with a single I/O function call by defining file views with derived datatypes. We explain how critical this feature is for high performance, why users must create and use derived datatypes whenever possible, and how it enables implementations to perform optimizations. In particular, we describe two optimizations our MPI-IO implementation, ROMIO, performs: data sieving and collective I/O. We present performance results on five different parallel machines: HP Exemplar, IBM SP, Intel Paragon, NEC SX-4, and SGI Origin2000. |
| S. Brunett, K. Czajkowski, I. Foster, C. Kesselman, J. Leigh, and S. Tuecke, "Application Experiences with the Globus Toolkit," Preprint ANL/MCS-P718-0698, June 1998 | The Globus grid toolkit is a collection of software components designed to support the development of applications for high-performance distributed computing environments, or "computational grids". The Globus toolkit is an implementation of a "bag of services" architecture, which provides application and tool developers not with a monolithic system but rather with a set of stand-alone services. Each Globus component provides a basic service, such as authentication, resource allocation, information, communication, fault detection, and remote data access. Different applications and tools can combine these services in different ways to construct "grid-enabled" systems. |
| P. Stelling, I. Foster, C. Kesselman, C. Lee, and G. von Laszewski, "A Fault Detection Service for Wide Area Distributed Computations," Preprint ANL/MCS-P719-0698, June 1998. | The potential for faults in distributed computing systems is a significant complicating factor for application developers. While a variety of techniques exist for detecting and correcting faults, the implementation of these techniques in a particular context can be difficult. Hence, we propose a fault detection service designed to be incorporated, in a modular fashion, into distributed computing systems, tools, or applications. This service uses well-known techniques based on unreliable fault detectors to detect and report component failure, while allowing the user to tradeoff timeliness of reporting against false positive rates. We describe the architecture of this service, report on experimental results that quantify its cost and accuracy, and describe its use in two applications, monitoring the status of system components of the GUSTO computational grid testbed and as part of the NetSolve network-enabled numerical solver. |
| H. G. Kaper and S. Tipei, "Manifold Compositions, Music Visualization, and Scientific Sonification in an Immersive Virtual-Reality Environment," Preprint ANL/MCS-P720-0798, July 1998. | An interdisciplinary project encompassing sound synthesis, music composition, sonification, and visualization of music is facilitated by the high-performance computing capabilities and the virtual-reality environments available at Argonne National Laboratory. The paper describes the main features of the project's centerpiece, DIASS (Digital Instrument for Additive Sound Synthesis); "A.N.L.-folds", an equivalence class of compositions produced with DIASS; and application of DIASS in two experiments in the sonification of complex scientific data. Some of the larger issues connected with this project, such as the changing ways in which both scientists and composers perform their tasks, are briefly discussed. |
| M. Garbey and H. G. Kaper,
"Asymptotic-Numerical Study of Supersensitivity for Generalized Burgers
Equations," SIAM J. on Scientific Computing 22, no. 1
(2000), pp. 368-385. Also Preprint ANL/MCS-P721-0798,
July 1998.
|
This article addresses analytical and computational issues related to the solution of Burgers' equation, -\epsilon uxx + ut + uux = 0 on (-1,1), subject to the boundary conditions u(-1) = 1+\delta, u(1) = -1, and its genealization to two dimensions, -\epsilon\Delta u + ut + uux + uuy = 0 on (-1,1) x (-\pi,\pi) subject to the boundary conditions u|x=1 = 1 + \delta, u|x=-1 = -1, with 2\pi periodicity in y. The perturbation parameters \delta and \epsilon are arbitrarily small positive and independent; when they approach 0, they satisfy the asmptotic order relation \delta = Os(e-a/\epsilon) for some constant a \in (0,1). The solutions of these convection-dominated viscous conservation laws exhibit a transition layer in the interior of the domain, whose position as t -> \infty is supersensitive to the boundary perturbation. Algorithms are presented for the computation of the position of the transition layer at steady state. The algorithm generalize to viscous conservation laws with a convex nonlinearity and are scalable in a parallel computing environment. |
| L. A. Freitag and C. Ollivier-Gooch, "A Cost/Benefit Analysis of Simplicial Mesh Improvement Techniques as Measured by Solution Efficiency," Preprint ANL/MCS-P722-0598, July 1998. | The quality of finite element and finite volume meshes has long been known to affect both the efficiency and the accuracy of the numerical solution of application problems. To improve the quality of these meshes, several researchers have devised new algorithms based on local reconnection schemes, node smoothing, and adaptive refinement or coarsening. In each case, the edges, vertices, or elements of the mesh are individually evaluated to determine whether performing the local operation improves the mesh. Therefore, these methods typically incur an O(N) computational cost, where N is the number of vertices in the mesh. This is a significant cost as N increases, and often only anecdotal evidence is given to demonstrate the benefit of these techniques in terms of solution efficiency for a particular application or solver. In this paper, we provide a deeper analysis of the tradeoffs associated with the cost of mesh improvement in terms of solution efficiency. We consider both finite element and finite volume discretization techniques, a number of different solvers, and a variety of application problems. The issue of solution accuracy will be addressed in a later paper. |
| R. Thakur, W. Gropp, and E. Lusk, "Data Sieving and Collective I/O in ROMIO, Preprint ANL/MCS-P723-0898, August 1998. | The I/O access patterns of parallel programs often consist of accesses to a large number of small, noncontiguous pieces of data. If an application's I/O needs are met by making many small, distinct I/O requests, however, the I/O performance degrades drastically. To avoid this problem, MPI-IO allows users to access a noncontiguous data set with a single I/O function call. This feature provides MPI-IO implementations an opportunity to optimize data access. We describe how our MPI-IO implementation, ROMIO, delivers high performance in the presence of noncontiguous requests. We explain in detail the two key optimizations ROMIO performs: data sieving for noncontiguous requests from one process and collective I/O for noncontiguous requests from multiple processes. We describe how one can implement these optimizations portably on multiple machines and file systems, control their memory requirements, and also achieve high performance. We demonstrate the performance and portability with performance results for three applications---an astrophysics-application template (DIST3D), the NAS BTIO benchmark, and an unstructured code (UNSTRUC)---on five different parallel machines: HP Exemplar, IBM SP, Intel Paragon, NEC SX-4, and SGI Origin2000. |
| C.-J. Lin and J. J. More', "Newton's Method for Large Bound-Constrained Optimization Problems," SIAM J. on Optimization 9 (1999), pp. 1100-1127. Also Preprint ANL/MCS-P724-0898, 8/98, rev. 4/99. | We analyze a trust region version of Newton's method for bound-constrained problems. Our approach relies on the geometry of the feasible set, not on the particular representation in terms of constraints. The convergence theory holds for linearly-constrained problems, and yields global and superlinear convergence without assuming neither strict complementarity nor linear independence of the active constraints. We also show that the convergence theory leads to an efficient implementation for large bound-constrained problems. |
| H. G. Kaper and P. Takác, "Bifurcating Vortex Solutions of the Complex Ginzburg-Landau Equation," Discrete and Continuous Dynamical Systems 5, no. 4 (1999), pp. 871-880. Also Preprint ANL/MCS-P725-0998. | It is shown that the complex Ginzburg-Landau (CGL) equation on the real line admits nontrivial 2{\pi}-periodic vortex solutions that have 2n simple zeros ("vortices") per period. The vortex solutions bifurcate from the trivial solution and inherit their zeros from the solution of the linearized equation. This result rules out the possibility that the vortices are determining nodes for vortex solutions of the CGL equation. |
| L. Freitag, M. Jones, and P. Plassmann, "Mesh Component Design and Software Integration within SUMAA3d," Preprint ANL/MCS-P726-0998, Sept. 1998. | The requirements of distributed-memory applications that use mesh management software tools are diverse, and building software that meets these requirements represents a considerable challenge. In this paper we discuss design requirements for a general, component approach for mesh management for use within the context of solving PDE applications on parallel computers. We describe recent efforts with the SUMAA3d package motivated by a component-based approach and show how these efforts have considerably improved both the flexibility and the usability of this software. |
| S. Balay, B. Gropp, L. C. McInnes, and B. Smith, "A Microkernel Design for Component-based Parallel Numerical Software Systems," Preprint ANL/MCS-P727-0998. | What is the minimal software
infrastructure and what type of conventions are needed to simplify development of
sophisticated parallel numerical application codes using a variety of software components
that are not necessarily available as source code? We propose an opaque object-based
model where the objects are dynamically loadable from the file system or network.
The microkernel required to manage such a system needs to include, at most
We are experimenting with these ideas in the context of extensible numerical software within the ALICE (Advanced Large-scale Integrated Computational Environment) project, where we are building the microkernel to manage the interoperability among various tools for large-scale scientific simulations. This paper presents some preliminary observations and conclusions from our work with microkernel design. |
| J. L. Tilson, M. Minkoff, A. F. Wagner, R. Shepard, P. Sutton, R. J. Harrison, R. A. Kendall, A. T. Wong, "High-Performance Computational Chemistry: Hartree-Fock Electronic Structure Calculations on Massively Parallel Processors," The Int'l J. of High Performance Computing Applications 13(4) (Winter 1999), pp. 291-302. Also Preprint ANL/MCS-P728-0998, Sept. 1998. | The parallel performance of the NWChem version 1.2a parallel direct-SCF code has been characterized on five massively parallel supercomputers (IBM SP, Kendall Square KSR-2, CRAY T3D and T3E, and Intel Touchstone Delta) using single-point energy calculations on seven molecules of varying size (up to 389 atoms) and composition (first-row atoms, halogens, and transition metals). We compare the performance using both replicated-data and distributed-data algorithms and the original McMurchie-Davidson and recently incorporated TEXAS integrals packages. |
| J. S. Mullen and P. F. Fischer, "Filtering Techniques for Complex Geometry Fluid Flows," Comm. in Num. Meth. in Eng. 15 (1999), pp. 9-18. Also Preprint ANL/MCS-P729-1098, Nov. 1998. | We develop a class of filters based upon the numerical solution of high-order elliptic problems in R^d which allow for independent determination of order and cut-off wave number and which default to classical Fourier-based filters in homogeneous domains. However, because they are based on the solution of a PDE, the present filters are not restricted to applications in tensor-product based geometries as is generally the case for Fourier-based filters. The discrete representation of the filtered output is constructed from a Krylov space generated in solving a well-conditioned system arising from low-order PDE. |
| P. F. Fischer, N. I. Miller, and H. M. Tufo, "An Overlapping Schwarz Method for Spectral Element Simulation of Three-Dimensional Incompressible Flows," Preprint ANL/MCS-P730-1098, November.1998. | As the sound speed is infinite for incompressible flows, computation of the pressure constitutes the stiffest component in the time advancement of unsteady simulations. For complex geometries, efficient solution is dependent upon the availability of fast solvers for sparse linear systems. In this paper we develop a Schwarz preconditioner for the spectral element method using overlapping subdomains for he pressure. These local subdomain problems are derived from tensor products of one-dimensional finite element discretizations and admit use of fast diagonalization methods based upon matrix-matrix products. In addition, we use a coarse grid projection operator whose solution is computed via a fast parallel direct solver. The combination of overlapping Schwarz preconditioning and fast coarse grid solver provides as much as a fourfold reduction in simulation time over previously employed methods based upon deflation for parallel solution of multi-million grid point flow problems. |
| P. Hovland, B. Norris, L. Roh, and B. Smith, "Developing a Derivative-Enhanced Object-Oriented Toolkit for Scientific Computations," Preprint ANL/MCS-P731-1098, November 1998. | We describe the development of a differentiated version of PETSc, an object-oriented toolkit for the parallel solution of scientific problems modeled by partial differential equations. Traditionally, automatic differentiation tools are applied to scientific applications to produce derivative-augmented code, which can then be used for sensitivity analysis, optimization, or parameter estimation. Scientific toolkits play an increasingly important role in developing large-scale scientific applications. By differentiating PETSc, we provide accurate derivative computations in applications implemented using the toolkit. In addition to using automatic differentiation to generate a derivative enhanced version of PETSc, we exploit the component-based organization of the toolkit, applying high-level mathematical insight to increase the accuracy and efficiency of derivative computations. |
| R. Thakur, W. Gropp, and E. Lusk, "On Implementing MPI-IO Portably and with High Performance," Preprint ANL/MCS-P732-1098, October 1998. | We discuss the issues involved in
implementing MPI-IO portably on multiple machines and file systems and also achieving high
performance. One way to implement MPI-IO portably is to implement it on top of the
basic Unix I/O functions (open, lseek, read,
write, and close), which are themselves portable.
We argue that this approach has limitations in both functionality and performance.
We instead advocate an implementation approach that combines a large portion of portable
code and a small portion of code that is optimized separately for different machines and
file systems. We have used such an approach to develop a high-performance, portable
MPI-IO implementation, called ROMIO. In addition to basic I/O functionality, we consider the issues of supporting other MPI-IO features, such as 64-bit file sizes, noncontiguous accesses, collective I/O, asynchronous I/O, consistency and atomicity semantics, user-supplied hints, shared file pointers, portable data representation, file preallocation, and some miscellaneous features. We describe how we implemented each of these features on various machines and file systems. The machines we consider are the HP Exemplar, IBM SP, Intel Paragon, NEC SX-4, SGI Origin2000, and networks of workstations; and the file systems we consider are HP HFS, IBM PIOFS, Intel PFS, NEC SFS, SGI XFS, NFS, and any general Unix file system (UFS). We also present our thoughts on how a file system can be designed to better support MPI-IO. We provide a list of features desired from a file system that would help in implementing MPI-IO correctly and with high performance. |
| R. Overbeek, M. Fonstein, M. D'Souza, N. Maltsev, G. D. Pusch, "The Use of Gene Clusters to Infer Functional Coupling," Proc. Natl. Acad. Sci. 96 (March 1999), pp. 2896-2901. Also Preprint ANL/MCS-P733-1098, Oct. 1998. | Earlier, we presented evidence that it is possible to predict functional coupling between genes based on conservation of gene clusters between genomes. With the rapid increase in availability of prokarkyotic sequence data, it has become possible to verify and apply the technique. In this paper, we extend our characterization of the parameters that determine the utility of the approach, and we generalize the approach in a way that supports detection of common classes of functionally coupled genes (e.g., transport and signal transduction clusters). Now that the analysis includes over 30 complete or nearly complete genomes, it has become clear that this approach will play a significant role in supporting efforts to assign functionality to the remaining uncharacterized genes in sequenced genomes. |
| H. M. Tufo and P. F. Fischer, "Fast
Parallel Direct Solvers for Coarse Grid Problems," Preprint ANL/MCS-P734-1198,
Nov. 1998. . |
We develop a fast direct solver for parallel solution of "coarse grid" problems, Ax=b, such as arise when domain decomposition or multigrid methods are applied to elliptic partial differential equations in d space dimensions. The approach is based upon a (quasi-) sparse factorization of the inverse of A. If A is nxn and the number of processors in P, our approach requires O(n^g log_2 P ) time for communication O(n^{1+g}/P ) time for computation, where g == (d-1)/d. Results from a 512 node Intel Paragon shows that our algorithm compares favorably to more commonly used approaches which require O(n log_2 P ) time for communication and O(n^(1+g)) or O(n^2/P ) time for computation. Moreover, for leading edge mullticomputer systems with thousands of processors and n = P (i.e., communication dominated solves), we expect our algorithm to be markedly superior as it achieves substantially reduced message volume and arithmetic complexity over competing methods while retaining minimal message startup cost. |
| J. Michalakes, J. Dudhia, D. Gill, J. Klemp, and W. Skamarock, "Design of a Next-Generation Regional Weather Research and Forecast Model," Preprint ANL/MCS-P735-1198, Nov. 1998. | The Weather Research and Forecast (WRF) model is a new model development effort undertaken jointly by the National Center for Atmospheric Research (NCAR), the National Oceanic and Atmospheric Administration (NOAA), and a number of collaborating institutions and university scientists. The model is intended for use by operational NWP and university research communities, providing a common framework for idealized dynamical studies, full physics numerical weather prediction, air-quality simulation, and regional climate. It will eventually supersede large, well-established but aging regional models now maintained by the participating institutions. The WRF effort includes re-engineering the underlying software architecture to produce a modular, flexible code designed from the outset to provide portable performance across diverse computing architectures. This paper outlines key elements of the WRF software design. |
| W. Gropp and E. Lusk, "PVM and MPI Are Completely Different," Preprint ANL/MCS-P737-1198, Dec. 1998. | PVM and MPI are often compared. These comparisons usually start with the unspoken assumption that PVM and MPI represent different solutions to the same problem. In this paper we show that, in fact, the two systems often are solving different problems. In cases where the problems do match but the solutions chosen by PVM and MPI are different, we explain the reasons for the differences. Usually such differences can be traced to explicit differences in the goals of the two systems, their origins, or the relationship between their specifications and their implementations. |
| I. Foster, N. T. Karonis, C. Kesselman, S. Tuecke, "Managing Security in High-Performance Distributed Computations," Cluster Computing 1(1) (1998), pp. 95-107. Also Preprint ANL/MCS-P741-1298, Dec. 1998. | We describe a software infrastructure designed to support the development of applications that use high-speed networks to connect geographically distributed supercomputers, databases, and scientific instruments. Such applications may need to operate over open networks and access valuable resources, and hence can require mechanisms for ensuring integrity and confidentiality of communications and for authenticating both users and resources. Yet security solutions developed for traditional client-server applications do not provide direct support for the distinctive program structures, programming tools, and performance requirements encountered in these applications. To address these requirements, we are developing a security-enhanced version of a communication library called Nexus, which is then used to provide secure versions of various parallel libraries and languages, including the popular Message Passing Interface. These tools support the wide range of process creation mechanisms and communication structures used in high-performance computing. They also provide a fine degree of control over what, where, and when security mechanisms are applied. In particular, a single application can mix secure and nonsecure communication, allowing the programmer to make fine-grained security/performance tradeoffs. We present performance results that enable us to quantify the performance of our infrastructure. |
| R. Thakur, W. Gropp, and E. Lusk, "Optimizing Noncontiguous Accesses in MPI-IO," Preprint ANL/MCS-P742-0299, Feb. 1999. Revised Nov. 2000. | The I/O access patterns of many parallel applications consist of accesses to a large number of small, noncontiguous pieces of data. If an application's I/O needs are met by making many small, distinct I/O requests, however, the I/O performance degrades drastically. To avoid this problem, MPI-IO allows users to access noncontiguous data with a single I/O function call, unlike in Unix I/O. In this paper, we explain how critical this feature of MPI-IO is for high performance and how it enables implementations to perform optimizations. We first provide a classification of the different ways of expressing an application's I/O needs in MPI-IO—we classify them into four levels. called level 0 through level 3. We demonstrate that, for applications with noncontiguous access patterns, the I/O performance improves dramatically if users write their applications to make level-3 requests (noncontiguous, collective) rather than level-0 requests (Unix style). We then describe how our MPI-IO implementation, ROMIO, delivers high performance for noncontiguous requests. We explain in detail the two key optimizations ROMIO performs: data sieving for noncontiguous requests from one process and collective I/O for noncontiguous requests from multiple processes. We describe how we have implemented these optimizations portably on multiple machines and file systems, controlled their memory requirements, and also achieved high performance. We demonstrate the performance and portability with performance results for three applications—an astrophysics-application template (DIST3D), the NAS BTIO benchmark, and an unstructured code (UNSTRUC)—on five different parallel macahines: HP Exemplar, IBM SP, Intel Paragon, NEC SX-4, and SGI Origin2000. |
| P. Wang, S. Baley, K. Seperhrnoori, J. Wheeler, J. Abate, B. Smith, G. A. Pope, "A Fully Implicit Parallel EOS Compositional Simulator for Large Scale Reservoir Simulation," Proc. 1999 Soc. of Petroleum Engineers, 15th Reservoir Simulation Symp., 1998. Also Preprint ANL/MCS-P743-1298, Dec. 1998. | A fully implicit parallel equation-of-state (EOS) compositional simulator for large-scale reservoir simulation is presented. This simulator is developed under the framework named IPARS (Integrated Parallel Accurate Reservoir Simulator) and is constructed using a Newton-type formulation. The Peng-Robinson EOS is used for the hydrocarbon phase behavior calculations. The linear solvers from the PETSc package (Portable Extensible Toolkit for Scientific Computation) are used for the solution of the underlying linear equations. The framework provides input/output, table lookups, Fortran array memory allocation, domain decomposition, and message passing between processors for updating physical properties in mass-balance equations in overlapping regions. PETSc handles communications between processors needed for the linear solver. Many test runs were performed with up to four million gridblocks for a dry-gas injection process on an IBM SP machine a half a million gridblocks on a cluster of 16 PCs. Results indicate that the scalability of the simulator is very good. The linear solver takes around half of the total computational time for homogeneous reservoirs. For layered heterogeneous reservoirs, the linear solver took a larger fraction of the total computational time as the permeability contrast increased. The time for the communication between processors for updating the flow equations is insignificant. The PC cluster is roughly a factor of two slower than the SP for parallel runs, which is very encouraging. This factor is strongly related to the hardware configuration of the computers, which is detailed in the paper. |
| D. K. Kaushik, D. E. Keyes, and B. F. Smith, "Newton-Krylov-Schwarz Methods for Aerodynamics Problems: Compressible and Incompressible Flows on Unstructured Grids," in Proc. 11th Int. Conf. on Domain Decomposition Methods, C-H. Lai, P. Bjorstad, M. Cross, O. Widlund, eds., DDM.org (1998). Also Preprint ANL/MCS-P745-1298, Dec. 1998. | We review and extend to the compressible regime an earlier parallelization of an implicit incompressible unstructured Euler code, and solve for flow over an M6 wing in subsonic, transonic, and supersonic regimes. While the parallelization philosophy of the compressible case is identical to the incompressible, we focus here on the nonlinear and linear convergence rates, which vary in different physical regimes, and on comparing the performance of currently important computational platforms. |
| J.-P. Navarro, "IBM SP High-Performance Networking with a GRF," white paper, Preprint ANL/MCS-P747-1298. | Increasing use of highly distributed applications, demand for faster data exchange, and highly parallel applications can push the limits of conventional external networking for IBM SP sites. In technical computing applications we have observed a growing use of a pipeline of hosts and networks collaborating to collect, process, and visualize large amounts of real-time data. The GRF, a high-performance IP switch from Ascend and IBM, is the first backbone network switch to offer a media card that can directly connect to an SP Switch. This enables switch attached hosts in an SP complex to communicate at near SP Switch speeds with other GRF attached hosts and networks. |
| [MCS | Research | Resources | People | Collaboration | Software | Publications | Information] | |||
|