mcs | publications | abstracts

1995 Abstracts of MCS Reports and Preprints


Reports

A. K. Bansal, "Establishing a Framework for Comparative Analysis of Genome Sequences," ANL/MCS-TM-209 (June 1995). This report describes a framework and a high-level language toolkit for comparative analysis of genome sequence alignment. The framework integrates the information derived from multiple sequence alignment and phylogenetic tree (hypothetical tree of evolution) to derive new properties about sequences. Multiple sequence alignments are treated as an abstract data type. Abstract operations have been described to manipulate a multiple sequence alignment and to derive mutation related information from a phylogenetic tree by superimposing parsimonious analysis. The framework has been applied on protein alignments to derive constrained columns (in a multiple sequence alignment) that exhibit evolutionary pressure to preserve a common property in a column despite mutation. A Prolog toolkit based on the framework has been implemented and demonstrated on alignments containing 3000 sequences and 3904 columns.
C. Bischof, A. Carle, and P. Khademi, "Fortran 77 Interface Specification to the SparsLinC 1.0 Library," ANL/MCS-TM-196 (May 1995). The SparsLinC library, written in C, has been developed for exploiting sparsity in automatic differentiation of codes. Issues pertaining to the proper interface to the library from Fortran programs are discussed, including the interpretation of Fortran INTEGERs as C pointers, and the representation of Fortran precisions in C. The Appendix contains the full set of Fortran Interfaces to the SparsLinC library.
C. Bischof, A. Carle, P. Khademi, A. Mauer, and P. Hovland, "ADIFOR 2.0 User's Guide (Rev. D, June 1998)," ANL/MCS-TM-192, June 1998

 

Automatic differentiation is a technique for computing the derivatives of functions described by computer programs. ADIFOR implements automatic differentiation by transforming a collection of FORTRAN 77 subroutines that compute a function f into new FORTRAN 77 subroutines that compute the derivatives of the outputs of f with respect to a specified set of inputs of f. This guide describes step by step how to use version 2.0 of ADIFOR to generate derivative code. Familiarity with UNIX and FORTRAN 77 is assumed.
D. W. Braun, G. W. Crabtree, H. G. Kaper, A. E. Koshelev, G. K. Leaf, D. M. Levine, and V. M. Vinokur, "The Structure of a Moving Vortex Lattice," ANL/MCS-TM-211 (November 1995). Numerical solutions of the time-dependent Ginzburg-Landau equations show a new mechanism for plastic motion of a driven vortex lattice in a clean superconductor. The mechanism, which involves the creation of a defect superstructure, is intrinsic to the moving vortex lattice and is independent of bulk pinning. Other structural features found in the solutions include a reorientation of the vortex lattice and a gradual healing of lattice defects under the influence of a transport current.
E. Coskun and M. K. Kwong, "Parallel Solution of the Time-dependent Ginzburg-Landau Equations and Other Experiences Using BlockComm-Chameleon and PCN on the IBM SP, Intel iPSC/860, and Clusters of Workstations," ANL-95/49 (September 1995). Time-dependent Ginzburg-Landau (TDGL) equations are considered for modeling a thin-film finite size superconductor placed under magnetic field. the problem then leads to the use of so-called natural boundary conditions. Computational domain is partitioned into subdomains and bond variables are used in obtaining the corresponding discrete system of equations. An efficient time-differencing method based on the Forward Euler method is developed. Finally, a variable strength magnetic field resulting in a vortex motion in Type II High T_c superconducting films is introduced. We tackled our problem using two different state-of-the-art parallel computing tools: BlockComm/Chameleon and PCN. We had access to two high-performance distributed memory supercomputers: the Intel iPSC/860 and IBM SP1. We also tested the codes using, as a parallel computing environment, a cluster of Sun Sparc workstations.
I. T. Foster and B. R. Toonen, "Load-Balancing Algorithms for the Parallel Community Climate Model," ANL/MCS-TM-190 (January 1995). Implementations of climate models on scalable parallel computer systems can suffer from load imbalances resulting from temporal and spatial variations in the amount of computation required for physical parameterizations such as solar radiation and convective adjustment. We have developed specialized techniques for correcting such imbalances. These techniques are incorporated in a general-purpose, programmable load-balancing library that allows the mapping of computation to processors to be specified as a series of maps generated by a programmer-supplied load-balancing module. The communication required to move from one map to another is performed automatically by the library, without programmer intervention. In this paper, we describe the load-balancing problem and the techniques that we have developed to solve it. We also describe specific load-balancing algorithms that we have developed for PCCM2, a scalable parallel implementation of the Community Climate Model, and present experimental results that demonstrate the effectiveness of these algorithms on parallel computers. The load-balancing library developed in this work is available for use in other climate models.
I. Foster, J. Garnett, and S. Tuecke, "Nexus User's Guide," ANL/MCS-TM-204 (February 1995, draft). This document describes the command line arguments and other features that are common to any program that incorporates the Nexus runtime library. It is divided into sections describing the execution, debugging, tuning, and profiling of such programs. It is not intended as a compiler writer's guide, and does not include any information on the Nexus interface or other internal details.
I. Foster, C. Kesselman, and S. Tuecke, "Nexus: Runtime Support for Task-Parallel Programming Languages," ANL/MCS-TM-205. A runtime system provides a parallel language compiler with an interface to the low-level facilities required to support interaction between concurrently executing program components. Nexus is a portable runtime system for task-parallel programming languages. Distinguishing features of Nexus include its support for multiple threads of control, dynamic processor acquisition, dynamic address space creation, a global memory model via interprocessor references, and asynchronous events. In addition, it supports heterogeneity at multiple levels, allowing a single computation to utilize different programming languages, executables, processors, and network protocols. Nexus is currently being used as a compiler target for two task-parallel languages: Fortran M and Compositional C++. In this report, we present the Nexus design, outline techniques used to implement Nexus on parallel computers, show how it is used in compilers, and compare its performance with that of another runtime system.
The PORTS Consortium, "The PORTS0 Interface," ANL/MCS-TM-203 (Feb. 1995, draft). The PORTS (POrtable RunTime System) group was established to address the problems of constructing a common runtime system to be used as a compiler target for various task- and data-parallel languages. One result of this group's efforts is the definition of an applications programming interface, the PORTS level-zero interface (PORTS0). This interface comprises lightweight thread functions and a core set of reentrant library routines. This report describes the PORTS0 interface.
W. Gropp, "Users Manual for doctext: Producing Documentation from C Source Code," ANL/MCS-TM-206 (March 1995). One of the major problems that software library writers face, particularly in a research environment, is the generation of documentation. Producing good, professional-quality documentation is tedious and time consuming. Often, no documentation is produced. For many users, however, much of the need for documentation may be satisfied by a brief description of the purpose and use of the routines and their arguments. Even for more complete, hand-generated documentation, this information provides a convenient starting point. We describe here a tool that may be used to generate documentation about programs written in the C language. It uses a structured comment convention that preserves the original C source code and does not require any additional files. The markup language is designed to be an almost invisible structured comment in the C source code, retaining readability in the original source. Documentation in a form suitable for the Unix man program (nroff), LaTeX, and the World Wide Web (WWW) can be produced.
W. Gropp, "Users Manual for tohtml: Producing True Hypertext Documents from LaTeX," ANL/MCS-TM-207 (March 1995). The World Wide Web has made it possible to use and disseminate documents as "hypertext." One of the major advantages of hypertext over conventional text is that references to other documents or items can be linked directly into the document, allowing the easy retrieval of related information. A collection of documents can also be read this way, jumping from one document to another based on the interests of the reader. This does require that the hypertext documents be extensively cross-linked. Unfortunately, most existing documents are designed as linear documents. Even worse, most authors still think of documents as linear structures, to be read from front to back. To deal with this situation, a number of tools have been created that take documents in an existing word-processing system or markup language and generate "HTML," the hypertext markup language used on the Web. While this process makes a single document available in a convenient form on the Web, it does not give access to cross-document linking, a major advantage of hypertext. This manual describes a program, {\tt tohtml}, that takes LaTeX input files, as well as files of link information, and produces a hypertext document that can contain extensive cross-links. A related program, {\tt doctext}, aids in the generation of manual pages that can be referenced by a LaTeX document. 
W. Gropp, "Users Manual for bfort: Producing Fortran Interfaces to C Source Code," ANL/MCS-TM-208 (March 1995). In many applications, the most natural computer language to write in may be different from the most natural language to provide a library in. For example, many scientific computing applications are written in Fortran, while many software libraries---particularly those dealing with complicated data structures or dynamic memory management---are written in C. Providing an interface so that Fortran programs can call routines written in C can be a tedious and error-prone process. We describe here a tool that automatically generates a Fortran-callable wrapper for routines written in C, using only a small, structured comment and the declaration of the routine in C. This tool has been used on two large software packages, PETSc and the MPICH implementation of MPI.
Paul Hovland, "Using ADIFOR 1.0 to Compute Hessians," ANL/MCS-TM-195 (April 1995). ADIFOR provides a simple means to produce code for the first derivatives of functions through the technique of automatic differentiation. However, the fact that ADIFOR currently cannot produce code to compute second derivatives limits its usefulness for certain applications. This paper describes how ADIFOR and related tools can be used to produce code that does compute second derivatives and discusses how to use this code. Conclusions are presented about the limitations of this method and how it might compare with second-derivative code produced directly by ADIFOR.
David Levine, "Users Guide to the PGAPack Parallel Genetic Algorithm Library," Technical Report ANL-95/18 (January 1996). PGAPack is a parallel genetic algorithm library that is intended to provide most capabilities desired in a genetic algorithm package, in an integrated, seamless, and portable manner. Key features of PGAPack are the ability to be called from Fortran or C, that it is executable on uniprocessors, multiprocessors, multicomputers, and workstation networks, and that it has binary-, integer-, real-, and character-valued native data types. Moreover, it has object-oriented data structure neutral design, parameterized population replacement, and multiple choices for selection, crossover, and mutation operators. It is easy to integrate hill-climbing heuristics and has an easy-to-use interface for novice and application users. It also has multiple levels of access for expert users, full extensibility to support custom operators and new data types, extensive debugging facilities, and a large set of example problems.
David A. Lifka, Mark W. Henderson, and Karen Rayl, "Users Guide to the Argonne SP Scheduling System," ANL/MCS-TM-201 (May 1995). During the past five years scientists discovered that modern UNIX workstations connected with ethernet and fiber networks could provide enough computational performance to compete with the supercomputers of the day. As this concept became increasingly popular, the need for distributed queuing and scheduling systems became apparent. Today, supercomputers, such as Argonne National Laboratory's IBM SP system, can provide more CPU and networking speed than can be obtained from these networks of workstations. These modern supercomputers look like clusters of workstations, however, so developers felt that the scheduling systems that were previously used on clusters of workstations should still apply. After trying to apply some of these scheduling systems to Argonne's SP environment, it became obvious that these two computer environments have very different scheduling needs. Recognizing this need and realizing that no one has addressed it, we developed a new scheduling system. The approach taken in creating this system was unique in that user input and interaction were encouraged throughout the development process. Thus, a scheduler was built that actually worked the way the users wanted it to work. This document serves a dual purpose. It is both a user's guide and an administrator's guide for the ANL SP scheduling system.
W. McCune, "A Case Study in Automated Theorem Proving: A Difficult Problem about Commutators," ANL/MCS-TM-202 (February 1995). This paper shows how the automated deduction system OTTER was used to prove the group theory theorem x**3 = e => [[[y,z],u],v] = e, where e is the identity, and [x,y] is the commutator x'y'xy. This is a difficult problem for automated provers, and several lengthy searches were run before a proof was found. Problem formulation and search strategy played a key role in the success. I believe that ours is the first automated proof of the theorem.
Michael Paul Mesnier, "The Network-Enabled Optimization System Server," ANL/MCS-TM-210 (August 1995). Mathematical optimization is a technology under constant change and advancement, drawing upon the most efficient and accurate numerical methods to date. Further, these methods can be tailored for a specific application or generalized to accommodate a wider range of problems. This perpetual change creates an ever growing field, one that is often difficult to stay abreast of. Hence, the impetus behind the Network-Enabled Optimization System (NEOS) server, which aims to provide users, both novice and expert, with a guided tour through the expanding world of optimization. The NEOS server is responsible for bridging the gap between users and the optimization software they seek. More specifically, the NEOS server will accept optimization problems over the Internet and return a solution to the user either interactively or by e-mail. This paper discusses the current implementation of the server.
G. D. Pusch, C. Bischof, and A. Carle, "On Automatic Differentiation of Codes with COMPLEX Arithmetic with Respect to Real Variables," ANL/MCS-TM-188 (June 1995). We explore what it means to apply automatic differentiation with respect to a set of real variables to codes containing complex arithmetic. That is, both dependent and independent variables with respect to differentiation are real variables, but in order to exploit features of complex mathematics, part of the code is expressed by employing complex arithmetic. We investigate how one can apply automatic differentiation to complex variables if one exploits the homomorphism of the complex numbers C onto R**2. It turns out that, by and large, the usual rules of differentiation apply, but subtle differences in special cases arise for sqrt(), abs(), and the power operator.

Preprints

Christian H. Bischof, Alan Carle, Peyvand M. Khademi, and Gordon Pusch, "Automatic Differentiation: Obtaining Fast and Reliable Derivatives---Fast," Preprint MCS-P484-1194. In this paper, we introduce automatic differentiation as a method for computing derivatives of large computer codes. After a brief discussion of methods of differentiating codes, we review automatic differentiation and introduce the ADIFOR automatic differentiation tool. We highlight some applications of ADIFOR to large industrial and scientific codes, and discuss the effectiveness and performance of our approach. Finally, we discuss sparsity in automatic differentiation and introduce the SparsLinC library.
Ali Bouaricha and Jorge J. More, "Impact of Partial Separability on Large-Scale Optimization," J. Comput. Optimization Appl. 7, no. 1 (1997), pp. 27-40. Also Preprint MCS-P487-0195, November 1995. ELSO is an environment for the solution of large-scale optimization problems. With ELSO the user is required to provide only code for the evaluation of a partially separable function. ELSO exploits the partial separability structure of the function to compute the gradient efficiently using automatic differentiation. We demonstrate ELSO's efficiency by comparing the various options available in ELSO. Our conclusion is that the hybrid option in ELSO provides performance comparable to the hand-coded option, while having the significant advantage of not requiring a hand-coded gradient or the sparsity pattern of the partially separable function. In our test problems, which have carefully coded gradients, the computing time for the hybrid AD option is within a factor of two of the hand-coded option.
C. Bischof, A. Bouaricha, P. M. Khademi, and J. J. More', "Computing Gradients in Large-Scale Optimization Using Automatic Differentiation," INFORMS J. on Computing 9, no. 2 (Spring 1997). Also Preprint MCS-P488-0195. The accurate and efficient computation of gradients for partially separable functions is central to the solution of large-scale optimization problems, since these functions are ubiquitous in large-scale problems. We describe two approaches for computing gradients of partially separable functions via automatic differentiation. In our experiments we employ the ADIFOR (Automatic Differentiation of Fortran) tool and the SparsLinC (Sparse Linear Combination) library. We use applications from the MINPACK-2 test problem collection to compare the numerical reliability and computational efficiency of these approaches with hand-coded derivatives and approximations based on differences of function values. Our conclusion is that automatic differentiation is the method of choice, providing code for the efficient computation of the gradient without the need for tedious hand coding.
M. K. Kwong and B. Lin, "W-Transform Method for Feature-oriented Multiresolution Image Retrieval," Preprint MCS-P489-0195. We present a local-feature-oriented image indexing and retrieval method based on Kwong and Tang's W-transform. Multiresolution histogram comparison is an effective method for content-based image indexing and retrieval. However, most recent approaches perform multiresolution analysis for whole images but do not exploit the local features present in the images. Since W_transform is featured by its ability to handle images of arbitrary size, with no periodicity assumptions, it provides a natural tool for analyzing local image features and building indexing systems based on such features. In our approach, the histograms of the local features of images are used in the indexing system. The system not only can retrieve images that are similar or identical to the query images but also can retrieve images that contain features specified in the query images, even if the retrieved images as a whole might be very different from the query images. The local-feature-oriented method also provides a speed advantage over the global multiresolution histogram comparison method. The feature-oriented approach is expected to be applicable in managing large-scale image systems such as video databases and medical image databases.
I. Foster and C. Kesselman, "Language Constructs and Runtime Systems for Compositional Parallel Programming," Preprint MCS-P490-0195. In task-parallel programs, diverse activities can take place concurrently, and communication and synchronization patterns are complex and not easily predictable. Previous work has identified compositionality as an important design principle for task-parallel programs. In this paper, we discuss alternative approaches to the realization of this principle. We first provide a review and critical analysis of Strand, an early compositional programming language. We examine the strengths of the Strand approach and also its weaknesses, which we attribute primarily to the use of a specialized language. Then, we present an alternative programming language framework that overcomes these weaknesses. This framework uses simple extensions to existing sequential languages (C++ and Fortran) and a common runtime system to provide a basis for the construction of large, task-parallel programs. We also discuss the run-time system techniques required to support these languages on parallel and distributed computer systems.
P. Hovland, C. Bischof, D. Spiegelman, and M. Casella, "Efficient Derivative Codes through Automatic Differentiation and Interface Contraction: An Application in Biostatistics," SIAM J. Sci. Comput. 18(4) (1997), pp. 1056-1066. Also Preprint MCS-P491-0195. Developing code for computing the first- and higher-order derivatives of a function by hand can be very time-consuming and is prone to errors. Automatic differentiation has proven capable of producing derivative codes with very little effort on the part of the user. Automatic differentiation avoids the truncation errors characteristic of divided-difference approximations. However, the derivative code produced by automatic differentiation can be significantly less efficient than one produced by hand. This shortcoming may be overcome by utilizing insight into the high-level structure of a computation. This paper focuses on how to take advantage of the fact that the number of variables passed between subroutines frequently is small compared with the number of the variables with respect to which we wish to differentiate. Such an "interface contraction," coupled with the associativity of the chain rule for differentiation, allows us to apply automatic differentiation in a more judicious fashion, resulting in much more efficient code for the computation of derivatives. A case study involving the ADIFOR (Automatic Differentiation of Fortran) tool and a program for maximizing a logistic-normal likelihood function developed from a problem in nutritional epidemiology is examined, and performance figures are presented.
J. N. Lyness and S. Joe, "Triangular Canonical Forms for Lattice Rules of Prime-Power Order," Preprint MCS-P492-0495. In this paper we develop a theory of t-cycle D-Z representations for s-dimensional lattice rules of prime-power order. Of particular interest are canonical forms that, by definition, have a D-matrix consisting of the nontrivial invariants. Among these is a family of triangular forms, which, besides being canonical, have the defining property that their Z-matrix is a column-permuted version of a unit upper-triangular matrix. Triangular forms may be obtained constructively by using sequences of elementary transformations based on elementary matrix algebra. Our main result is to define a unique canonical form for prime-power rules. This ultratriangular form is a triangular form, is easy to recognize, and may be derived in a straightforward manner.
W. McCune and R. Padmanabhan, "Single Identities for Lattice Theory and for Weakly Associative Lattices," Algebra Universalis 36 (1996), pp. 436-449. Also Preprint MCS-P493-0395. We present a single identity for the variety of all lattices that is much simpler than those previously known to us. We also show that the variety of weakly associative lattices is one-based, and we present a generalized one-based theorem for subvarieties of weakly associative lattices that can be defined with absorption laws. The automated theorem-proving program Otter was used in a substantial way to obtain the results.
A. J. Conley and H. B. Keller, "Wavy Taylor Vortices in Plane Couette Flow," Preprint MCS-P495-0195. Path-following techniques applied to a spectral approximation of the solution of the Navier-Stokes Equations have revealed the existence of a new class of solutions to the plane Couette flow problem.
M. Huang, M. Papka, T. DeFanti, D. Levine, L. Turner, and L. Kettunen, "Virtual Reality Visualization of Accelerator Magnets," Preprint MCS-P496-0295. We describe the use of the CAVE virtual reality visualization environment as an aid to the design of accelerator magnets. We have modeled an elliptical multipole wiggler magnet being designed for use at the Advanced Photon Source at Argonne National Laboratory. The CAVE environment allows us to explore and interact with the 3-D visualization of the magnet. Capabilities include changing the number of periods of the magnet displayed, changing the icons used for displaying the magnetic field, and changing the current in the electromagnet and observing the effect on the magnetic field and particle beam trajectory through the field.
E. T. Olsen and B. Lin, "A Wavelet Phase Filter for Emission Tomography," Preprint MCS-P497-0295. The presence of a high level of noise is a characteristic in some tomographic imaging techniques such as positron emission tomography. Wavelet methods can smooth out noise while preserving significant features of images. Mallat et al. proposed a wavelet-based denoising scheme exploiting wavelet modulus maxima, but the scheme is sensitive to noise. In this study, we explore the properties of wavelet phase, with a focus on reconstruction of emission tomography images. Specifically, we show that the wavelet phase of regular Poisson noise under a Haar-type wavelet transform converges in distribution to a random variable uniformly distributed on [0, 2 pi). We then propose three wavelet-phase-based denoising schemes which exploit this property: edge tracking, local phase variance thresholding, and scale phase variation thresholding. Numerical results are also presented. The numerical experiments indicate that wavelet-phase techniques show promise for wavelet based denoising methods.
D. A. Lifka, "The ANL/IBM SP Scheduling System," Preprint MCS-P498-0395. During the past five years scientists discovered that modern UNIX workstations connected with ethernet and fibre networks could provide enough computational performance to compete with the supercomputers of the day. As this concept became increasingly popular, the need for distributed queuing and scheduling systems became apparent. Systems such as DQS from Florida State were developed and worked very well. Today, supercomputers, like Argonne National Laboratory's IBM SP system, can provide more CPU and networking speed than can be obtained from these networks of workstations. These modern supercomputers look like clusters of workstations, however, so developers felt that the scheduling systems that were previously used on clusters of workstations should still apply. After trying to apply some of these scheduling systems to Argonne's SP environment, it became obvious that these two computer environments have very different scheduling needs. Recognizing this need and realizing that no one has addressed it, I developed a new scheduling system. The approach taken in creating this system was unique in that user input and interaction were encouraged throughout the development process. Thus, a scheduler was built that actually "worked" the way the users wanted it to work.
William Gropp, "An Introduction to Performance Debugging for Parallel Computers," Preprint MCS-P500-0295, April 1995. Programming parallel computers for performance is a difficult task that requires careful attention to both single-node performance and data exchange between processors. This paper discusses some of the sources of poor performance, ways to identify them in an application, and a few ways to address these issues.
Shannon Bradshaw, Thomas Canfield, John Kokinis, Terrence Disz, "An Interactive Virtual Environment for Finite Element Analysis," Proc. High-Performance Computing '95, Phoenix, Arizona, April 12-13, 1995. Also Preprint MCS-P501-0395. Virtual environments (VE) provide a powerful human-computer interface that opens the door to exciting new methods of interaction with high-performance computing applications in several areas of research. We are interested in the use of virtual environments as a user interface to real-time simulations used in rapid prototyping procedures. Consequently, we are developing methods for coupling finite element models of complex mechanical systems with a VE interface for real-time interaction.
Terrence Disz, Michael Papka, Rick Stevens, Michael Pellegrino, and Valerie Taylor, "Virtual Reality Visualization of Parallel Molecular Dynamics Simulation," Proc. High-Performance Computing '95, Phoenix, Arizona, April 12-13, 1995. Also preprint MCS-P502-0395. When performing communications mapping experiments for massively parallel processors, it is important to be able to visualize the mappings and resulting communications. In a molecular dynamics model, visualization of the atom to atom interaction and the processor mappings provides insight into the effectiveness of the communications algorithms. The basic quantities available for visualization in a model of this type are the number of molecules per unit volume, the mass, and velocity of each molecule. The computational information available for visualization is the atom to atom interaction within each time step, the atom to processor mapping, and the energy rescaling events. We use the CAVE (CAVE Automatic Virtual Environment) to provide interactive, immersive visualization experiences.
Valerie E. Taylor, Rick Stevens, and Thomas Canfield, "Performance Models of Interactive, Immersive Visualization for Scientific Applications," Proc. High Performance for Computer Graphics and Visualization Conference, July 3-4, 1995, Swansea, U.K. Also Preprint MCS-P503-0395. In this paper we develop a performance model for analyzing the end-to-end lag in a combined supercomputer/virtual environment. We first present a general model and then use this model to analyze the end-to-end lag of a finite element simulation that a user interacts in real time via the CAVE Automatic Virtual Environment. The simulation is executed on an IBM SP-2 parallel supercomputer. Our model decouples the viewpoint lag (not involving the simulation) from the interaction lag (using the results of the simulations). This model allows one to understand the relative contributions to end-to-end lag of the following components: rendering, tracking, network latency, simulation time, and various types of synchronization lags. The results of the study indicate that the rendering and network latency are the major contributors of the end-to-end lag.
S. A. Gabriel and A. S. Kydes, "A Nonlinear Complementarity Approach for the National Energy Modeling System," MCS-P504-0395. The National Energy Modeling System (NEMS) is a large-scale mathematical model that computes equilibrium fuel prices and quantities in the U.S. energy sector. At present, to generate these equilibrium values, NEMS sequentially solves a collection of linear programs and nonlinear equations. The NEMS solution procedure then incorporates the solutions of these linear programs and nonlinear equations in a nonlinear Gauss-Seidel approach. We describe how the current version of NEMS can be formulated as a particular nonlinear complementarity problem (NCP), thereby possibly avoiding current convergence problems. In addition, we show that the NCP format is equally valid for a more general form of NEMS. We also describe several promising approaches for solving the NCP form of NEMS based on recent Newton type methods for general NCPs. These approaches share the feature of needing to solve their direction-finding subproblems only approximately. Hence, they can effectively exploit the sparsity inherent in the NEMS NCP.
J. More' and Z. Wu, "Global Continuation for Distance Geometry Problems," SIAM J. Optim. 7, no. 3 (August 1997), pp. 814-836. Also Preprint MCS-P505-0395. Distance geometry problems arise in the interpretation of NMR data and in the determination of protein structure. We formulate the distance geometry problem as a global minimization problem with special structure, and show that global smoothing techniques and a continuation approach for global optimization can be used to determine solutions of distance geometry problems with a nearly 100% probability of success.
Man Kam Kwong, P. T. Peter Tang, and Biquan Lin, "A Real-Time MPEG Software Decoder Using a Portable Message-Passing Library," Preprint MCS-P506-0395. We present a real-time MPEG software decoder that uses message-passing libraries such as MPL, p4, and MPI. the parallel MPEG decoder currently runs on the IBM SP system but can be easily ported to other parallel machines. This paper discusses our parallel MPEG decoding algorithm as well as the parallel programming environment under which it uses. Several technical issues are discussed, including balancing of decoding speed, memory limitation, I/O capacities, and optimization of MPEG decoding components. This project shows that a real-time portable software MPEG decoder is feasible in a general-purpose parallel machine.
Christian H. Bischof, "On the Automatic Differentiation of Computer Programs," Preprint MCS-P507-0495. Automatic differentiation (AD) is a methodology for developing sensitivity-enhanced versions of arbitrary computer programs. In this paper, we provide some background information on AD and address some frequently asked questions. We introduce the ADIFOR and ADIC tools for the automatic differentiation of Fortran 77 and ANSI-C programs, respectively, and give an example of applying ADIFOR in the context of the optimization of multibody systems.
Steven A. Gabriel, "An NE/SQP Method for the Bounded Nonlinear Complementarity Problem," Preprint MCS-P508-0695. NE/SQP is a recent algorithm that has proven quite effective for solving the pure and mixed forms of the nonlinear complementarity problem (NCP). NE/SQP is robust in the sense that its direction-finding subproblems are always solvable; in addition, the convergence rate of this method is Q-quadratic. In this paper we consider a generalized version of NE/SQP proposed by Pang and Qi that is suitable for the bounded NCP. We extend their work by demonstrating a stronger convergence result and then test a proposed method on several numerical problems.
I. T. Foster, B. Toonen, and P. H. Worley, "Performance of Massively Parallel Computers for Spectral Atmospheric Models," J. Atmospheric and Oceanic Technology 13, no. 5 (October 1996), pp. 1031-1045. Also Preprint MCS-P509-0495. Massively parallel processing (MPP) computer systems use high-speed interconnection networks to link hundreds or thousands of RISC microprocessors. With each microprocessor having a peak performance of 100 or more Mflops/sec, there is at least the possibility of achieving very high performance. However, the question of exactly how to achieve this performance remains unanswered. MPP systems and vector multiprocessors require very different coding styles. Different MPP systems have widely varying architectures and performance characteristics. For most problems, a range of different parallel algorithms is possible, again with varying performance characteristics. In this paper, we provide a detailed, fair evaluation of MPP performance for a weather and climate modeling application. Using a specially designed spectral transform code, we study performance on three different MPP systems: Intel Paragon, IBM SP2, and Cray T3D. We take great care to control for performance differences due to varying algorithmic characteristics. The results yield insights into MPP performance characteristics, parallel spectral transform algorithms, and coding style for MPP systems. We conclude that it is possible to construct parallel models that achieve multi-Gflops/sec performance on a range of MPPs, if the models are constructed to allow runtime selection of appropriate algorithms.
M. Garbey and H. G. Kaper, "Heterogeneous Domain Decomposition for Singularly Perturbed Elliptic Boundary Value Problems," SIAM J. Numer. Anal. 34, no. 4 (August 1997), pp. 1513-1544. Also Preprint MCS-P510-0495. A heterogeneous domain-decomposition method is presented for the numerical solution of singularly perturbed elliptic boundary value problems. The method, which is parallelizable at various levels, uses several ideas of asymptotic analysis. The subdomains match the domains of validity of the local ("inner" and "outer") asymptotic expansions, and cut-off functions are used to match solutions in neighboring subdomains. The positions of the interfaces, as well as the mesh widths, depend on the small parameter, e. On the subdomains, iterative solution techniques are used, which may vary from one subdomain to another. The global convergence rate depends on e; it generally increases like some power of (log(e**{-1}))**{-1} as e \downarrow0. The method is illustrated on several two-dimensional singular perturbation problems.
A. J. Conley, "Centrifugal Destabilization and Restabilization of Plane Shear Flows," Preprint MCS-P511-0395. The flow of an incompressible viscous fluid between parallel plates becomes unstable when the plates are tumbled. As the tumbling rate increases, the flow restabilizes. This phenomenon is elucidated by path-following techniques. The solution of the Navier-Stokes equations is approximated by spectral techniques. The linear stability of these solutions is studied.
Christian H. Bischof, William T. Jones, Andrew Mauer, and Jamshid Samareh-Abolhassani, "Experiences with the Application of the ADIC Automatic Differentiation Tool to the CSCMDO 3-D Volume Grid Generation Code," 34th AIAA Aerospace Sciences Meeting and Exhibit, Jan. 15-18, 1996, Reno, Nev., AIAA 96-0716. Also Preprint MCS-P512-0595, October 1995. Automatic differentiation (AD) is a methodology for developing reliable sensitivity-enhanced versions of arbitrary computer programs with little human effort. It can vastly accelerate the use of advanced simulation codes in multidisciplinary design optimization, since the time for generating and verifying derivative codes is greatly reduced. In this paper, we report on the application of the recently developed ADIC automatic differentiation tool for ANSI C programs to the CSCMDO multiblock three-dimensional volume grid generator. The ADIC-generated code can easily be interfaced with Fortran derivative codes generated with the ADIFOR AD tool for FORTRAN 77 programs, thus providing efficient sensitivity-enhancement techniques for multilanguage, multidiscipline problems.
Robert K. Gjertsen, Jr., Mark T. Jones, and Paul E. Plassmann, "Parallel Heuristics for Improved, Balanced Graph Colorings," Preprint MCS-P513-0595. The computation of good, balanced graph colorings is an essential part of many algorithms required in scientific and engineering applications. Motivated by an effective sequential heuristic, we introduce a new parallel heuristic, PLF, and show that this heuristic has the same expected runtime under the P-RAM computational model as the scalable coloring heuristic introduced by Jones and Plassmann (JP). We present experimental results performed on the Intel DELTA that demonstrate that this new heuristic consistently generates better colorings and requires only slightly more time than the JP heuristic. In the second part of the paper we introduce two new parallel color-balancing heuristics, PDR(k) and PLF(k). We show that these heuristics have the desirable property that they do not increase the number of colors used by an initial coloring during the balancing process. We present experimental results that show that these heuristics are very effective in obtaining balanced colorings and, in addition, exhibit scalable performance.
Juan Mario Restrepo, Gary K. Leaf, and Andreas Griewank, "Circumventing Storage Limitations in Variational Data Assimilation Studies," SIAM J. on Scientific Computing 19 (1998), pp. 1586-1605.  Also Preprint MCS-P515-0595. The aim of data assimilation is to infer the state of a system from a geophysical model and possibly incomplete or nonuniformly distributed spatiotemporal observational data. Used extensively in engineering control theory applications, data assimilation has relatively recently been introduced into meteorological forecasting, natural-resource recovery modeling, and climate dynamics. Variational data assimilation is a promising assimilation technique in which it is assumed that the state of the system is an extrema of a carefully chosen objective function. Provided that an adjoint model is available, the required model gradients can be computed by integrating the model forward and its adjoint backward. The gradients are then used to extremize the cost function with a suitable iterative or conjugate gradient solver. The problem we address in this study is the explosive growth in both on-line computer memory and remote storage requirements of large-scale assimilation studies. This imposes a severe physical limitation on the size of assimilation studies, even on the largest computers. By using a recursive strategy, a schedule can be constructed that enables the forward/adjoint model runs to be performed in such a way that storage requirements can be traded for longer computational times. This generally applicable strategy enables data assimilation studies on significantly larger domains than would otherwise be possible given particular hardware constraints. We show that this tradeoff is indeed viable and that when the schedule is optimized, the storage and computational times grow at most logarithmically.
J. N. Lyness and L. M. Delves, "On the Implementation of a Modified Sag-Szekeres Quadrature Method," Preprint MCS-P516-0595. We describe a modified Sag-Szekeres multidimensional quadrature algorithm and discuss its implementation as a general-purpose library procedure on serial and parallel architectures. Examples illustrate its effectiveness for both smooth and singular integrands.
W. McCune and R. Padmanabhan, "An Equational Characterization of the Conic Construction on Cubic Curves," Preprint MCS-P517-0595. An n-ary Steiner law f(x_1,x_2,...,x_n) on a projective curve \Gamma over an algebraically closed field k is a totally symmetric n-ary morphism f from \Gamma^n to \Gamma satisfying the universal identity f(x_1,x_2,...,x_{n-1},f(x_1,x_2,...,x_n)) = x_n . An element e in \Gamma is called an idempotent for f if f(e,e,...,e) = e. The binary morphism x*y of the classical chord-tangent construction on a nonsingular cubic curve is an example of a binary Steiner law on the curve, and the idempotents of * are precisely the inflection points of the curve. In this paper, we prove that if f and g are two 5-ary Steiner laws on an elliptic curve \Gamma sharing a common idempotent, then f=g. We use a new rule of inference rule =(gL)=>, extracted from a powerful local-to-global principle in algebraic geometry. This rule is implemented in the theorem-proving program OTTER. Then we use OTTER to automatically prove the uniqueness of the 5-ary Steiner law on an elliptic curve. Very much like the binary case, this theorem provides an algebraic characterization of a geometric construction process involving conics and cubics. The well-known theorem of the uniqueness of the group law on such a curve is shown to be a consequence of this result.
Tomasz Bieleck, Li M. Song, Stephen S. T. Yau, and Man Kam Kwong, "Random Wavelet Transforms, Algebraic Geometric Coding, and Their Applications in Signal Compression and De-Noising," Preprint MCS-P518-0595. The concepts of random wavelet transforms and discrete random wavelet transforms are introduced. It is shown that these transforms can lead to simultaneous compression and de-noising of signals that have been corrupted with fractional noises. Potential applications of algebraic geometric coding theory to encode the ensuing data are also discussed.
Christian H. Bischof, Peyvand M. Khademi, Ali Bouaricha, and Alan Carle, "Efficient Computation of Gradients and Jacobians by Transparent Exploitation of Sparsity in Automatic Differentiation," Optimization Methods and Software 7 (1996), pp. 1-39. Also Preprint MCS-P519-0595. Automatic differentiation (AD) is a technique that augments computer codes with statements for the computation of derivatives. The computational workhorse of AD-generated codes for first-order derivatives is the linear combination of vectors. For many large-scale problems, the vectors involved in this operation are inherently sparse. If the underlying function is a partially separable one (e.g., if its Hessian is sparse), many of the intermediate gradient vectors computed by AD will also be sparse, even though the final gradient is likely to be dense. For large Jacobians computations, every intermediate derivative vector is usually at least as sparse as the least sparse row of the final Jacobian. In this paper, we show that transparent exploitation of the sparsity inherent in derivative computation can result in dramatic gains in runtime and memory savings. For a set of gradient problems exhibiting implicit sparsity, we report on the runtime and memory requirements of computing the gradients with the ADIFOR (Automatic DIfferentiation of FORtran) tool, both with and without employing the SparsLinC (Sparse Linear Combinations) library, and show that SparsLinC can reduce runtime and memory costs by orders of magnitude. We also compute sparse Jacobians using the SparsLinC-based approach---in the process, automatically detecting the sparsity structure of the Jacobian---and show that these Jacobian results compare favorably with those of previous techniques that require a priori knowledge of the sparsity structure of the Jacobian.
Jorge J. More' and Zhijun Wu, "Epsilon-Optimal Solutions to Distance Geometry Problems via Global Continuation," Preprint MCS-P520-0595. We show that a continuation approach to global optimization with global smoothing techniques can be used to obtain e-optimal solutions to distance geometry problems. We show that determining an e-optimal solution is still an NP-hard problem when e is small. A discrete form of the Gaussian transform is proposed based on the Hermite form of Gaussian quadrature. We show that the modified transform can be used whenever the transformed functions cannot be computed analytically. Our numerical results show that the discrete Gauss transform can be used to obtain e-optimal solutions for general distance geometry problems, and in particular, to determine the three-dimensional structure of protein fragments.
Paul Hovland, Steve Altus, Christian Bischof, and Ilan Kroo, "Using Automatic Differentiation with the Quasi-Procedural Method for Multidisciplinary Design Optimization," in Proc. 34th AIAA Aerospace Sciences Meeting, Reno, Nevada, January 15-19, 1996, AIAA 96-0090. Also Preprint MCS-P521-0595. As computers have become increasingly powerful, the field of design optimization has moved toward higher fidelity models in the early stages of design. One way in which this movement is manifested is in the increasing popularity of multidisciplinary design optimization (MDO). Because the models used in MDO are large and complicated, a modular design is desirable. There are many design parameters to optimize, and the robustness of the method requires that derivatives be computed accurately and efficiently. This paper describes how the quasi-procedural program architecture developed by Takai and Kroo and the technique of automatic differentiation can be combined to address these needs effectively. The two techniques are explained, the manner in which they were integrated into a single framework is described, and the result of using this framework for an optimization problem in airplane design is presented.
Larry Wos, "The Power of Combining Resonance with Heat," J. of Automated Reasoning (to appear). Also Preprint MCS-P522-0695, August 1995. In this article, I present experimental evidence of the value of combining two strategies each of which has proved powerful in various contexts. The resonance strategy gives preference (for directing a program's reasoning) to equations or formulas that have the same shape (ignoring variables) as one of the patterns supplied by the researcher to be used as a resonator. The hot list strategy rearranges the order in which conclusions are drawn, the rearranging caused by immediately visiting and, depending on the value of the heat parameter, even immediately revisiting a set of input statements chosen by the researcher; the chosen statements are used to complete applications of inference rules rather than to initiate them. Combining these two strategies often enables an automated reasoning program to attack deep questions and hard problems with far more effectiveness than using either alone. The use of this combination in the context of cursory proof checking produced most unexpected and satisfying results, as I show here. I present the material (including commentary) in the spirit of excerpts from an experimenter's notebook, thus meeting the frequent request to illustrate how a researcher can make wise choices from among the numerous options offered by McCune's automated reasoning program OTTER. I include challenges and topics for research and, to aid the researcher, in the Appendix a sample input file and a number of intriguing proofs.
William D. Reynolds, Jr., "Image Compression Using the W-Transform," Preprint MCS-P524-0695, October 1995. We present the W-transform for a multiresolution signal decomposition. One of the differences between the wavelet transform and W-transform is that the W-transform leads to a nonorthogonal signal decomposition. Another difference between the two is the manner in which the W-transform handles the endpoints (boundaries) of the signal. This approach does not restrict the length of the signal to be a power of two. Furthermore, it does not call for the extension of the signal; thus, the W-transform is a convenient tool for image compression. We present the basic theory behind the W-transform and include experimental simulations to demonstrate its capabilities.
S. Zafar, Y-Q. Zhang, and B. Jabbari, "Block-Classified Motion Compensation Scheme for Digital Video," Preprint MCS-P525-0695, November 1995. A novel scheme for block-based motion compensation is introduced in which a block is classified according to the energy that is directly related to the motion activity it represents. This classification allows more flexibility in controlling the bit rate and the signal-to-noise ratio and results in a reduction in motion search complexity. The method introduced is not dependent on the particular type of motion search algorithm implemented and can thus be used with any method assuming that the underlying matching criteria used is minimum absolute difference. It has been shown that the method is superior to a simple motion compensation algorithm where all blocks are motion compensated regardless of the energy resulting after the displaced difference.
J. Chen, M. Griem, J. Aarsvold, C.-T. Chen, T. Disz, R. Hudson, M. K. Kwong, and B. Lin, "A High-Performance Image Analysis and Visualization Engine for Three-dimensional Microscopy Imaging," Preprint MCS-P526-0695. Three-dimensional microscopy imaging is important for understanding complex biological assemblies. Computers and digital image acquisition systems have made possible the three-dimensional reconstruction of images obtained from optical microscopes. However, since processing such images requires tremendous CPU, storage, and I/O capacities, a high-performance computing facility is necessary in order to achieve real-time performance. But it is uneconomical (if not impossible) for a medical center or a biological research institution to operate a large computer system at the present time. The advent of high-speed networks such as asynchronous transfer mode (ATM) enables biologists and radiologists to use remote high-performance computers to process such images and visualize reconstructed 3D images in real time. In this paper, we present such a prototype system that integrates microscopy image acquisition, parallel deblurring, 3D image visualization, and high-speed networks.
Jacqueline Fleckinger-Pelle' and Hans G. Kaper, "Gauges for the Ginzburg-Landau Equations of Superconductivity," ZAMM - Z. angew. Math. Mech. 76 (1996) S2, pp. 345-348. Also Preprint MCS-P527-0795. This note discusses some gauge choices for the time-dependent Ginzburg-Landau equations of superconductivity. The equations model the state of a superconducting sample in a magnetic field near the critical temperature. Any two solutions related through a "gauge transformation" describe the same state and are physically indistinguishable. This "gauge invariance" can be exploited for analytical and numerical purposes.
Man Kam Kwong, "Orthogonally Compensated W-Multiresolution Analysis and Signal Processing," Preprint MCS-P528-0795. The concept of a W-matrix is used to give an elementary interpretation of a biorthogonal wavelet decomposition of signals. We also give a method to modify the decomposition to give an orthogonal projection on the space spanned by the scaling vectors. Roughly speaking, our treatment is a finite-length analog of the well-known theory of multiresolution analysis of Meyer and Mallat. Our approach differs in that it deals directly with the discrete case, it takes care of the boundary elements without explicit padding, and it uses a notion similar to that of semiorthogonality introduced by Chui. Our algorithm has flexibility in the choice of filter coefficients. The decomposition, orthogonalization, and restoration algorithms are computationally fast.
Rajeev Thakur and Alok Choudhary, "An Extended Two-Phase Method for Accessing Sections of Out-of-Core Arrays," Preprint MCS-P530-0595, Sept. 1995. A number of applications on parallel computers deal with very large data sets that cannot fit in main memory. In such applications, data must be stored in files on disks and fetched into memory during program execution. Parallel programs with large out-of-core arrays stored in files must read/write smaller sections of the arrays from/to files. In this paper, we describe a method for accessing sections of out-of-core arrays efficiently. Our method, the extended two-phase method, uses collective I/O: Processors cooperate to combine several I/O requests into fewer larger granularity requests, reorder requests so that the file is accessed in proper sequence, and eliminate simultaneous I/O requests for the same data. In addition, the I/O workload is divided among processors dynamically, depending on the access requests. We present performance results obtained from two real out-of-core parallel applications---matrix multiplication and a Laplace's equation solver---and several synthetic access patterns, all on the Intel Touchstone Delta. These results indicate that the extended two-phase method significantly outperformed a direct (non-collective) method for accessing out-of-core array sections.
Christian H. Bischof, Gordon D. Pusch, and Ralf Knoesel, "Sensitivity Analysis of the MM5 Weather Model Using Automatic Differentiation," Computers in Physics 10, no. 6 (1996), pp. 605-612. Also Preprint MCS-P532-0895, December 1995. We present a general method for using automatic differentiation to facilitate model sensitivity analysis. Automatic differentiation techniques augment, in a completely mechanical fashion, an existing code such that it also simultaneously and efficiently computes derivatives. Our method allows the sensitivities of the code's outputs to its parameters and inputs to be determined with minimal human effort by exploiting the relationship between differentiation and formal perturbation theory. Employing this method, we performed a sensitivity study of the MM5 code, a mesoscale weather model jointly developed by Penn State University and the National Center for Atmospheric Research, comprising roughly 40,000 lines of Fortran 77 code. Our results show that AD-computed sensitivities exhibit superior accuracy compared with divided differences approximations computed from finite-amplitude perturbations, while consuming comparable or less CPU time and less human labor. We also comment on a numerically induced precursor wave that would almost certainly have been undetectable if one used a divided difference method.
Rajeev Thakur, Ewing Lusk, and William Gropp, "I/O Characterization of a Portable Astrophysics Application on the IBM SP and Intel Paragon," Preprint MCS-P534-0895, August 1995. Many large-scale applications on parallel machines are bottlenecked by the I/O performance rather than the CPU or communication performance of the system. To improve the I/O performance, it is first necessary for system designers to understand the I/O requirements of various applications. This paper presents the results of a study of the I/O characteristics and performance of a real, I/O-intensive, portable, parallel application in astrophysics, on two different parallel machines---the IBM SP and the Intel Paragon. We instrumented the source code to record all I/O activity and analyzed the resulting trace files. Our results show that, for this application, the I/O consists of fairly large writes, and writing data to files is faster on the Paragon, whereas opening and closing files are faster on the SP. We also discuss how the I/O performance of this application could be improved; particularly, we believe that this application would benefit from using collective I/O.
S. Zafar, Y-Q. Zhang, and B. Jabbari, "Block-Classified Bidirectional Motion Compensation Scheme for Wavelet-Decomposed Digital Video," Preprint MCS-P535-0895, November 1995. In this paper we introduce a block-classified bidirectional motion compensation scheme for our previously developed wavelet-based video codec, where multiresolution motion estimation is performed in the wavelet domain. The frame classification structure described in this paper is similar to that used in the MPEG standard. Specifically, the I-frames are intraframe coded, the P-frames are interpolated from a previous I- or a P-frame, and the B-frames are bidrectional interpolated frames. We apply this frame classification structure to the wavelet domain with variable block sizes and multiresolution representation. We use a symmetric bidirectional scheme for the B-frames and classify the motion blocks as intraframe, compensated either from the preceding or the following frame, or bidrectional (i.e., compensated based on which type yields the minimum energy). We also introduce the concept of F-frames, which are analogous to P-frames but are predicted from the following frame only. This improves the overall quality of the reconstruction in a group of pictures (GOP) but at the expense of extra buffering. We also study the effect of quantization of the I-frames on the reconstruction of a GOP, and we provide intuitive explanation for the results.
John Drake, Ian Foster, John Michalakes, Brian Toonen, and Patrick Worley, "Design and Performance of a Scalable Parallel Community Climate Model," Preprint MCS-P536-0895, September 1995. We describe the design of a parallel global atmospheric circulation model, PCCM2. This parallel model is functionally equivalent to the National Center for Atmospheric Research's Community Climate Model, CCM2, but is structured to exploit distributed memory multicomputers. PCCM2 incorporates parallel spectral transform, semi-Lagrangian transport, and load balancing algorithms. We represent detailed performance results on the IBM SP2 and Intel Paragon. These results provide insights into the scalability of the individual parallel algorithms and of the parallel model as a whole.
D. W. Braun, G. W. Crabtree, H. G. Kaper, A. E. Koshelev, G. K. Leaf, D. M. Levine, and V. M. Vinokur, "The Structure of a Moving Vortex Lattice," Physical Review Letters 76, no. 5 (January 1996), pp. 831-834. Also Preprint MCS-P537-0895, December 1995. Numerical solutions of the time-dependent Ginzburg-Landau equations show a new mechanism for plastic motion of a driven vortex lattice in a clean superconductor. The mechanism, which involves the creation of a defect superstructure, is intrinsic to the moving vortex lattice and is independent of bulk pinning. Other structural features found in the solutions include a re-orientation of the vortex lattice and a gradual healing of lattice defects under the influence of a transport current.
K. Forsman, W. Gropp, L. Kettunen, D. Levine, and J. Salonen, "Solution of Dense Systems of Linear Equations Arising from Integral Equation Formulations," Preprint MCS-P538-0895. This paper discusses efficient solution of dense systems of linear equations arising from integral equation formulations. Several preconditioners in connection with Krylov iterative solvers are examined and compared with LU factorization. Results are shown demonstrating practical aspects and issues we have encountered in implementing iterative solvers on both parallel and sequential computers.
Jorge J. More' and Zhijun Wu, ``Global Smoothing and Continuation for Large-Scale Molecular Optimization,'' Preprint MCS-P539-1095. We discuss the formulation of optimization problems that arise in the study of distance geometry, ionic systems, and molecular clusters. We show that continuation techniques based on global smoothing are applicable to these molecular optimization problems, and we outline the issues that must be resolved in the solution of large-scale molecular optimization problems.
L. Wos, "OTTER and the Moufang Identity Problem," Journal of Automated Reasoning (to appear). Also Preprint MCS-P540-0895, November 1995. This article provides additional evidence of the value of using an automated reasoning program as a research assistant. Featured is the use of Bill McCune's program OTTER to find proofs of theorems taken from the study of Moufang loops, but not just any proofs. Specifically, the proofs satisfy the property of purity. In particular, when given, say, four equivalent identities (which is the case in this article), one is asked to prove the second identity from the first, the third from the second, the fourth from the third, and the first from the fourth. If the proof that 1 implies 2 does not rely on 3 or 4, then by definition the proof is pure with respect to 3 and 4, or simply the proof is pure. If for the four identities one finds four pure proofs showing that 1 implies 2, 2 implies 3, 3 implies 4, and 4 implies 1, then by definition one has found a circle of pure proofs. By finding the needed twelve pure proofs, this article shows that there does exist a circle of pure proofs for the four equivalent identities for Moufang loops and for all orderings of the identities; however, for much of this article, the emphasis is on the first three identities. In addition--in part to promote the use of automated reasoning programs and to answer questions concerning the choice of options--featured here is the methodology that was employed and a discussion of some of the obstacles, some of which are subtle. The approach relies on paramodulation (with generalizes equality substitution), on demodulation, and--so crucial for attacking deep questions and hard problems--on various strategies, most important of which are the hot list strategy, the set of support strategy, and McCune's ratio strategy. To permit verification of the results presented here, extension of them, and application of the methodology to other unrelated fields, a sample input file and four proofs (relevant to a circle of pure proofs for the four identities) are included. Research topics and challenges are offered at the close of this article.
Steven A. Gabriel and Jorge J. More', "Smoothing of Mixed Complementarity Problems," Preprint MCS-P541-0995, September 1995. We introduce a smoothing approach to the mixed complementarity problem, and study the limiting behavior of a path defined by approximate minimizers of a nonlinear least squares problem. Our main result guarantees that, under a mild regularity condition, limit points of the iterates are solutions to the mixed complementarity problem. The analysis is applicable to a wide variety of algorithms suitable for large-scale mixed complementarity problems.
Jorge J. More' and Zhijun Wu, "Smoothing Techniques for Macromolecular Global Optimization," Preprint MCS-P542-0995, September 1995. We study global optimization problems that arise in macromolecular modeling, and the solution of these problems via continuation and smoothing. Our results unify and extend the theory associated with the use of the Gaussian transform for smoothing. We show that the Gaussian transform can be viewed as a special case for a generalized transform and that these generalized transforms share many of the properties of the Gaussian transform. We also show that the smoothing behavior of the generalized transform can be studied in terms of the Fourier transform and that these results indicate that the Gaussian transform has superior smoothing properties.
I. T. Foster, J. L. Tilson, A. F. Wagner, R. Shepard, R. J. Harrison, R. A. Kendall, R. J. Littlefield, "High Performance Computational Chemistry: (I) Scalable Fock Matrix Construction Algorithms," J. of Computational Chemistry 7, no. 1 (1996), pp. 109-123. Also Preprint MCS-P543-1095, November 1995. Several parallel algorithms for Fock matrix construction are described. The algorithms calculate only the unique integrals, distribute the Fock and density matrices over the processors of a massively parallel computer, use blocking techniques to construct the distributed data structures, and use clustering techniques on each processor to maximize data reuse. Algorithms based on both square and row blocked distributions of the Fock and density matrices are described and evaluated. Variants of the algorithms are discussed that use either triple-sort or canonical ordering of integrals, and dynamic or static task clustering schemes. The algorithms are shown to adapt to screening, with communication volume scaling down with computation costs. Modeling techniques are used to characterize algorithm performance. Given the characteristics of existing massively parallel computers, all the algorithms are shown to be highly efficient on problems of moderate size. The algorithms using the row blocked data distribution are the most efficient.
R. J. Harrison, M. F. Guest, R. A. Kendall, D. E. Bernholdt, A. T. Wong, M. Stave, J. L. Anchell, A. C. Hess, R. J. Littlefield, G. L. Fann, J. Nieplocha, G. S. Thomas, D. Elwood, J. Tilson, R. L. Shepard, Al F. Wagner, I. T. Foster, E. Lusk, R. Stevens, "High Performance Computational Chemistry: (II) A Scalable SCF Program," Preprint MCS-P544-1095, December 1995. We discuss issues in developing scalable parallel algorithms and focus in particular on the distribution, as opposed to the replication, of key data structures. Replication of large data structures limits the maximum calculation size by imposing a low ratio of processors to memory. Only applications which distribute both data and computation across processors are truly scalable. The use of shared data structures that may be independently accessed by each process even in a distributed-memory environment greatly simplifies development and provides a significant performance enhancement. We describe tools we have developed to support this programming paradigm. These tools are used to develop a highly efficient and scalable algorithm to perform self-consistent field calculations on molecular systems. A simple and classical strip-mining algorithm suffices to achieve an efficient and scalable Fock-Matrix construction in which all matrices are fully distributed. By strip-mining over atoms we also exploit all available sparsity and pave the way to adopting more sophisticated methods for summation of the Coulomb and exchange interactions.
C. H. Bischof and X. Sun, "On Tridiagonalizing and Diagonalizing Symmetric Matrices with Repeated Eigenvalues," SIAM J. Matrix Anal. Appl. 17, no. 4 (October 1996), pp. 869-885. Also Preprint MCS-P545-1095, December 1995. We describe a divide-and-conquer tridiagonalization approach for matrices with repeated eigenvalues. Our algorithm hinges on the fact that, under easily constructively verifiable conditions, a symmetric matrix with bandwidth b and k distinct eigenvalues must be block diagonal with diagonal blocks of size at most bk. A slight modification of the usual orthogonal band-reduction algorithm allows us to reveal this structure, which then leads to potential parallelism in the form of independent diagonal blocks. Compared with the usual Householder reduction algorithm, the new approach exhibits improved data locality, significantly more scope for parallelism, and the potential to reduce arithmetic complexity by close to 50% for matrices that have only two numerically distinct eigenvalues. The actual improvement depends to a large extend on the number of distinct eigenvalues and a good estimate thereof. However, at worst the algorithms behave like a successive bandreduction approach to tridiagonalization. Moreover, we provide a numerically reliable and effective algorithm for computing the eigenvalue decomposition of a symmetric matrix with two numerically distinct eigenvalues. Such matrices arise, for example, in invariant subspace decomposition approaches to the symmetric eigenvalue problem.
Yu Kuo, Sohail Zafar, and Bijan Jabbari, "Traffic Characterization and Modeling of Wavelet-based VBR Encoded Video," Preprint MCS-P546-1195. Wavelet-based video codecs provide a hierarchical structure for the encoded data, which can cater to a wide variety of applications such as multimedia systems. The characteristics of such an encoder and its output, however, have not been well examined. In this paper, we investigate the output characteristics of a wavelet-based video codec and develop a composite model to capture the traffic behavior of its output video data. Wavelet decomposition transforms the input video in a hierarchical structure with a number of subimages at different resolutions and scales. The top-level wavelet in this structure contains most of the signal energy. We first describe the characteristics of traffic generated by each subimage and the effect of dropping various subimages at the encoder on the signal-to-noise ratio at the receiver. We then develop an N-state Karkov model to describe the traffic behavior of the top wavelet. The behavior of the remaining wavelets are then obtained through estimation, based on the correlations between these subimages at the same level of resolution and those wavelets located at an immediate higher level. In our paper, a three-state Karkov model is developed. The resulting traffic behavior described by various statistical properties, such as moments and correlations, etc., are then utilized to validify our model.
R. B. Lehoucq and J. A. Scott, "An Evaluation of Software for Computing Eigenvalues of Sparse Nonsymmetric Matrices," Preprint MCS-P547-1195, January 1996. The past few years have seen a significant increase in research into numerical methods for computing selected eigenvalues of large sparse nonsymmetric matrices. This research has begun to lead to the development of high-quality mathematical software. The software includes codes that implement subspace iteration methods, Arnoldi-based algorithms, and nonsymmetric Lanczos methods. The aim of the current study is to evaluate this state-of-the-art software. In this study we consider subspace iteration and Arnoldi codes. We look at the key features of the codes and their ease of use. Then, using a wide range of test problems, we compare the performance of the codes in terms of storage requirements, execution times, accuracy, and reliability. We also consider their suitability for solving large-scale industrial problems. Base don our findings, we suggest how improved software should be designed.
Robert Almgren, Andrea Bertozzi, and Michael P. Brenner, "Stable and Unstable Singularities in the Unforced Hele-Shaw Cell," Phys. Fluids 8, no. 6 (June 1996), pp. 1356-1370. Also Preprint MCS-P549-1295, January 1996. We study singularity formation in the lubrication model for the unforced Hele-Shaw system, describing the breaking in two of a fluid droplet confined between two narrowly spaced glass plates. By varying the initial data, we exhibit four different scenarios: (1) the droplet breaks in finite time, with two pinch points moving toward each other and merging at the singular time; (2) the droplet breaks in finite time, with two asymmetric pinch points propagating away from each other; (3) the droplet beaks in finite time, with a single symmetric pinch point; or (4) the droplet relaxes to a stable equilibrium shape without a finite time breakup. Each of the three singular scenarios has a self-similar structure with different scaling laws; the first scenario has not been observed before in other Hele-Shaw studies. We demonstrate instabilities of the second and third scenarios, in which the solution changes its behavior at a thickness that can be arbitrarily small depending on the initial condition. These transitions can be identified by examining the structure of the solution in the intermediate scaling region.
E. Selkov et al., "The Metabolic Pathway Collection from EMP: The Enzymes and Metabolic Pathways Database," Nucleic Acids Res. 24 (1996). Also Preprint MCS-P550-1295, December 1995. The Enzymes and Metabolic Pathways database (EMP) is an encoding of the contents of over 10,000 original publication on the topics of enzymology and metabolism. This large body of information has been transformed into a queryable database. An extraction of over 1400 pictorial representations of metabolic pathways from this collection is freely available on the World Wide Web. We believe that this collection will play an important role in the interpretation of genetic sequence data, as well as offering a meaningful framework for the integration of many other forms of biological data.
G. Quintana-Orti', X. Sun, and C. H. Bischof, "A BLAS-3 Version of the QR Factorization with Column Pivoting," SIAM J. Sci. Comput. 19 (1998), pp. 1486-1494.  Also Preprint MCS-P551-1295, January 1996. The QR factorization with column pivoting (QRP), originally suggested by Golub and Businger in 1965, is a popular approach to computing rank-revealing factorizations. It was implemented in LINPACK with the Level 1 BLAS and in LAPACK with the Level 2 BLAS. While the Level 2 BLAS version generally delivers superior performance, it may result in worse performance for large matrix sizes due to cache effects. We introduce a modification of the QRP algorithm that allows the use of Level 3 BLAS kernels while maintaining the numerical behavior of the LINPACK and LAPACK implementations. Experimental comparisons of this approach with the LINPACK and LAPACK implementations on IBM RS/6000, SGI R8000, and DEC Alpha platforms show considerable performance improvements.
Ali Bouaricha and Robert B. Schnabel, "Tensor Methods for Large, Sparse Nonlinear Least Squares Problems," Preprint MCS-P552-1295, February 1996. This paper introduces tensor methods for solving large, sparse nonlinear least squares problems where the Jacobian either is analytically available or is computed by finite difference approximations. Tensor methods have been shown to have very good computational performance for small to medium-sized, dense nonlinear least squares problems. In this paper we consider the application of tensor methods to large, sparse nonlinear least squares problems. This involves an entirely new way of solving the tensor model that is efficient for sparse problems. A number of interesting linear algebraic implementation issues are addressed. The test results of the tensor method applied to a set of sparse nonlinear least squares problems compared with those of the standard Gauss-Newton method reveal that the tensor method is significantly more robust and efficient than the standard Gauss-Newton method.
L. Krusin-Elbaum, A. D. Marwick, R. Wheeler, C. Feidl, V. M. Vinokur, G. K. Leaf, and M. Palumbo, "Enhanced Pinning with controlled Splay Configurations of Columnar Defects; Rapid Vortex Motion at Large Angles," Physical Review Letters (to appear). Preprint MCS-P554-1295. Orders-of-magnitude enhancements of persistent currents J are reported in YB{sub 2}Cu{sub 3}O{sub 7-\delta} with columnar defects arranged in a variety of splayed configurations. The largest J is obtained for a planar distribution P{sub pl}(\Theta}, with a splay angle \Theta{sub opt} = +-5 deg. A comparison of P{sub pl}(\Theta) and a Gaussian distribution P{sub G}{\Theta} suggests that pinning by the latter is controlled by large-angle tails of the Gaussian, which appear to enhance thermal creep rate. Numerical simulations confirm the existence of the regimes where vortex motion is promoted rather than suppressed by splay.
Carl F. Ollivier-Gooch, "Quasi-ENO Schemes for Unstructured Meshes Based on Unlimited Data-Dependent Least-Squares Reconstruction," Preprint MCS-P555-1295, January 1996. A crucial step in obtaining high-order accurate steady-state solutions to the Euler and Navier-Stokes equations is the high-order accurate reconstruction of the solution from cell-averaged values. Only after this reconstruction has been completed can the flux integral around a control volume be accurately assessed. In this work, a new reconstruction scheme is presented that is conservative, uniformly accurate with no overshoots, easy to implement on arbitrary meshes, has good convergence properties, and is computationally efficient. The new scheme, called DD-L_2, uses a data-dependent weighted least-squares reconstruction with a fixed stencil. The weights are chosen to strongly emphasize smooth data in the reconstruction. Because DD-L_2 is designed in the framework of k-exact reconstruction, existing techniques for implementing such reconstructions on arbitrary meshes can be used. The new scheme satisfies a relaxed version of the ENO criteria. Local accuracy of the reconstruction is optimal except in the case on functions that are continuous but have discontinuous low-order derivatives. The total variation of the reconstruction is bounded by the total variation of the function to within O (\Delta x). The asymptotic behavior of the scheme in reconstructing smooth and piecewise smooth functions is demonstrated. DD-L_2 produces uniformly high-order accurate reconstructions, even in the presence of discontinuities. Two-dimensional flow solutions obtained using DD-L_2 reconstruction are compared with solutions using limited elast-squares reconstruction. The solutions are virtually identical. The absence of a limiter reduces the CPU time required for DD-L_2 solutions by 15-20% as compared to limited reconstruction, even though the DD-L_2 gradient computation is slightly more expensive than ordinary least-squares reconstruction.
[MCS | Research | Resources | People | Collaboration | Software | Publications | Information]
Last updated on January 23, 2004
Disclaimer
Security/Privacy Notice
webmaster@mcs.anl.gov