|
|
|
|
|
|
|
|
|
| A. K. Bansal, "Establishing a Framework for Comparative
Analysis of Genome Sequences," ANL/MCS-TM-209 (June
1995). |
This report describes a framework and a high-level language
toolkit for comparative analysis of genome sequence alignment. The framework integrates
the information derived from multiple sequence alignment and phylogenetic tree
(hypothetical tree of evolution) to derive new properties about sequences. Multiple
sequence alignments are treated as an abstract data type. Abstract operations have been
described to manipulate a multiple sequence alignment and to derive mutation related
information from a phylogenetic tree by superimposing parsimonious analysis. The framework
has been applied on protein alignments to derive constrained columns (in a multiple
sequence alignment) that exhibit evolutionary pressure to preserve a common property in a
column despite mutation. A Prolog toolkit based on the framework has been implemented and
demonstrated on alignments containing 3000 sequences and 3904 columns. |
|
|
| C. Bischof, A. Carle, and P. Khademi, "Fortran 77
Interface Specification to the SparsLinC 1.0 Library," ANL/MCS-TM-196 (May
1995). |
The SparsLinC library, written in C, has been developed for
exploiting sparsity in automatic differentiation of codes. Issues pertaining to the proper
interface to the library from Fortran programs are discussed, including the interpretation
of Fortran INTEGERs as C pointers, and the representation of Fortran precisions
in C. The Appendix contains the full set of Fortran Interfaces to the SparsLinC library. |
|
|
| C. Bischof, A. Carle, P. Khademi, A. Mauer, and P. Hovland,
"ADIFOR 2.0 User's Guide (Rev. D, June 1998)," ANL/MCS-TM-192, June
1998 |
Automatic differentiation is a technique for computing the
derivatives of functions described by computer programs. ADIFOR implements automatic
differentiation by transforming a collection of FORTRAN 77 subroutines that compute a
function f into new FORTRAN 77 subroutines that compute the derivatives of the outputs of
f with respect to a specified set of inputs of f. This guide describes step by step how to
use version 2.0 of ADIFOR to generate derivative code. Familiarity with UNIX and FORTRAN
77 is assumed. |
|
|
| D. W. Braun, G. W. Crabtree, H. G. Kaper, A. E. Koshelev, G.
K. Leaf, D. M. Levine, and V. M. Vinokur, "The Structure of a Moving Vortex
Lattice," ANL/MCS-TM-211
(November 1995). |
Numerical solutions of the time-dependent Ginzburg-Landau
equations show a new mechanism for plastic motion of a driven vortex lattice in a clean
superconductor. The mechanism, which involves the creation of a defect superstructure, is
intrinsic to the moving vortex lattice and is independent of bulk pinning. Other
structural features found in the solutions include a reorientation of the vortex lattice
and a gradual healing of lattice defects under the influence of a transport current. |
|
|
| E. Coskun and M. K. Kwong, "Parallel Solution of the
Time-dependent Ginzburg-Landau Equations and Other Experiences Using BlockComm-Chameleon
and PCN on the IBM SP, Intel iPSC/860, and Clusters of Workstations," ANL-95/49
(September 1995). |
Time-dependent Ginzburg-Landau (TDGL) equations are
considered for modeling a thin-film finite size superconductor placed under magnetic
field. the problem then leads to the use of so-called natural boundary conditions.
Computational domain is partitioned into subdomains and bond variables are used in
obtaining the corresponding discrete system of equations. An efficient time-differencing
method based on the Forward Euler method is developed. Finally, a variable strength
magnetic field resulting in a vortex motion in Type II High T_c superconducting films is
introduced. We tackled our problem using two different state-of-the-art parallel computing
tools: BlockComm/Chameleon and PCN. We had access to two high-performance distributed
memory supercomputers: the Intel iPSC/860 and IBM SP1. We also tested the codes using, as
a parallel computing environment, a cluster of Sun Sparc workstations. |
|
|
| I. T. Foster and B. R. Toonen, "Load-Balancing
Algorithms for the Parallel Community Climate Model," ANL/MCS-TM-190
(January 1995). |
Implementations of climate models on scalable parallel
computer systems can suffer from load imbalances resulting from temporal and spatial
variations in the amount of computation required for physical parameterizations such as
solar radiation and convective adjustment. We have developed specialized techniques for
correcting such imbalances. These techniques are incorporated in a general-purpose,
programmable load-balancing library that allows the mapping of computation to processors
to be specified as a series of maps generated by a programmer-supplied load-balancing
module. The communication required to move from one map to another is performed
automatically by the library, without programmer intervention. In this paper, we describe
the load-balancing problem and the techniques that we have developed to solve it. We also
describe specific load-balancing algorithms that we have developed for PCCM2, a scalable
parallel implementation of the Community Climate Model, and present experimental results
that demonstrate the effectiveness of these algorithms on parallel computers. The
load-balancing library developed in this work is available for use in other climate
models. |
|
|
| I. Foster, J. Garnett, and S. Tuecke, "Nexus User's
Guide," ANL/MCS-TM-204
(February 1995, draft). |
This document describes the command line arguments and other
features that are common to any program that incorporates the Nexus runtime library. It is
divided into sections describing the execution, debugging, tuning, and profiling of such
programs. It is not intended as a compiler writer's guide, and does not include any
information on the Nexus interface or other internal details. |
|
|
| I. Foster, C. Kesselman, and S. Tuecke, "Nexus: Runtime
Support for Task-Parallel Programming Languages," ANL/MCS-TM-205. |
A runtime system provides a parallel language compiler with
an interface to the low-level facilities required to support interaction between
concurrently executing program components. Nexus is a portable runtime system for
task-parallel programming languages. Distinguishing features of Nexus include its support
for multiple threads of control, dynamic processor acquisition, dynamic address space
creation, a global memory model via interprocessor references, and asynchronous events. In
addition, it supports heterogeneity at multiple levels, allowing a single computation to
utilize different programming languages, executables, processors, and network protocols.
Nexus is currently being used as a compiler target for two task-parallel languages:
Fortran M and Compositional C++. In this report, we present the Nexus design, outline
techniques used to implement Nexus on parallel computers, show how it is used in
compilers, and compare its performance with that of another runtime system. |
|
|
| The PORTS Consortium, "The PORTS0 Interface," ANL/MCS-TM-203 (Feb.
1995, draft). |
The PORTS (POrtable RunTime System) group was established to
address the problems of constructing a common runtime system to be used as a compiler
target for various task- and data-parallel languages. One result of this group's efforts
is the definition of an applications programming interface, the PORTS level-zero interface
(PORTS0). This interface comprises lightweight thread functions and a core set of
reentrant library routines. This report describes the PORTS0 interface. |
|
|
| W. Gropp, "Users Manual for doctext: Producing
Documentation from C Source Code," ANL/MCS-TM-206
(March 1995). |
One of the major problems that software library writers face,
particularly in a research environment, is the generation of documentation. Producing
good, professional-quality documentation is tedious and time consuming. Often, no
documentation is produced. For many users, however, much of the need for documentation may
be satisfied by a brief description of the purpose and use of the routines and their
arguments. Even for more complete, hand-generated documentation, this information provides
a convenient starting point. We describe here a tool that may be used to generate
documentation about programs written in the C language. It uses a structured comment
convention that preserves the original C source code and does not require any additional
files. The markup language is designed to be an almost invisible structured comment in the
C source code, retaining readability in the original source. Documentation in a form
suitable for the Unix man program (nroff), LaTeX, and the World Wide Web (WWW) can be
produced. |
|
|
| W. Gropp, "Users Manual for tohtml: Producing True
Hypertext Documents from LaTeX," ANL/MCS-TM-207
(March 1995). |
The World Wide Web has made it possible to use and
disseminate documents as "hypertext." One of the major advantages of hypertext
over conventional text is that references to other documents or items can be linked
directly into the document, allowing the easy retrieval of related information. A
collection of documents can also be read this way, jumping from one document to another
based on the interests of the reader. This does require that the hypertext documents be
extensively cross-linked. Unfortunately, most existing documents are designed as linear
documents. Even worse, most authors still think of documents as linear structures, to be
read from front to back. To deal with this situation, a number of tools have been created
that take documents in an existing word-processing system or markup language and generate
"HTML," the hypertext markup language used on the Web. While this process makes
a single document available in a convenient form on the Web, it does not give access to
cross-document linking, a major advantage of hypertext. This manual describes a program,
{\tt tohtml}, that takes LaTeX input files, as well as files of link information, and
produces a hypertext document that can contain extensive cross-links. A related program,
{\tt doctext}, aids in the generation of manual pages that can be referenced by a LaTeX
document. |
|
|
| W. Gropp, "Users Manual for bfort: Producing Fortran
Interfaces to C Source Code," ANL/MCS-TM-208
(March 1995). |
In many applications, the most natural computer language to
write in may be different from the most natural language to provide a library in. For
example, many scientific computing applications are written in Fortran, while many
software libraries---particularly those dealing with complicated data structures or
dynamic memory management---are written in C. Providing an interface so that Fortran
programs can call routines written in C can be a tedious and error-prone process. We
describe here a tool that automatically generates a Fortran-callable wrapper for routines
written in C, using only a small, structured comment and the declaration of the routine in
C. This tool has been used on two large software packages, PETSc and the MPICH
implementation of MPI. |
|
|
| Paul Hovland, "Using ADIFOR 1.0 to Compute
Hessians," ANL/MCS-TM-195
(April 1995). |
ADIFOR provides a simple means to produce code for the first
derivatives of functions through the technique of automatic differentiation. However, the
fact that ADIFOR currently cannot produce code to compute second derivatives limits its
usefulness for certain applications. This paper describes how ADIFOR and related tools can
be used to produce code that does compute second derivatives and discusses how to use this
code. Conclusions are presented about the limitations of this method and how it might
compare with second-derivative code produced directly by ADIFOR. |
|
|
| David Levine, "Users Guide to the PGAPack Parallel
Genetic Algorithm Library," Technical Report ANL-95/18 (January
1996). |
PGAPack is a parallel genetic algorithm library that is
intended to provide most capabilities desired in a genetic algorithm package, in an
integrated, seamless, and portable manner. Key features of PGAPack are the ability to be
called from Fortran or C, that it is executable on uniprocessors, multiprocessors,
multicomputers, and workstation networks, and that it has binary-, integer-, real-, and
character-valued native data types. Moreover, it has object-oriented data structure
neutral design, parameterized population replacement, and multiple choices for selection,
crossover, and mutation operators. It is easy to integrate hill-climbing heuristics and
has an easy-to-use interface for novice and application users. It also has multiple levels
of access for expert users, full extensibility to support custom operators and new data
types, extensive debugging facilities, and a large set of example problems. |
|
|
| David A. Lifka, Mark W. Henderson, and Karen Rayl,
"Users Guide to the Argonne SP Scheduling System," ANL/MCS-TM-201 (May
1995). |
During the past five years scientists discovered that modern
UNIX workstations connected with ethernet and fiber networks could provide enough
computational performance to compete with the supercomputers of the day. As this concept
became increasingly popular, the need for distributed queuing and scheduling systems
became apparent. Today, supercomputers, such as Argonne National Laboratory's IBM SP
system, can provide more CPU and networking speed than can be obtained from these networks
of workstations. These modern supercomputers look like clusters of workstations, however,
so developers felt that the scheduling systems that were previously used on clusters of
workstations should still apply. After trying to apply some of these scheduling systems to
Argonne's SP environment, it became obvious that these two computer environments have very
different scheduling needs. Recognizing this need and realizing that no one has addressed
it, we developed a new scheduling system. The approach taken in creating this system was
unique in that user input and interaction were encouraged throughout the development
process. Thus, a scheduler was built that actually worked the way the users wanted it to
work. This document serves a dual purpose. It is both a user's guide and an
administrator's guide for the ANL SP scheduling system. |
|
|
| W. McCune, "A Case Study in Automated Theorem Proving: A
Difficult Problem about Commutators," ANL/MCS-TM-202
(February 1995). |
This paper shows how the automated deduction system OTTER was
used to prove the group theory theorem x**3 = e => [[[y,z],u],v] = e, where e is the
identity, and [x,y] is the commutator x'y'xy. This is a difficult problem for automated
provers, and several lengthy searches were run before a proof was found. Problem
formulation and search strategy played a key role in the success. I believe that ours is
the first automated proof of the theorem. |
|
|
| Michael Paul Mesnier, "The Network-Enabled Optimization
System Server," ANL/MCS-TM-210
(August 1995). |
Mathematical optimization is a technology under constant
change and advancement, drawing upon the most efficient and accurate numerical methods to
date. Further, these methods can be tailored for a specific application or generalized to
accommodate a wider range of problems. This perpetual change creates an ever growing
field, one that is often difficult to stay abreast of. Hence, the impetus behind the
Network-Enabled Optimization System (NEOS) server, which aims to provide users, both
novice and expert, with a guided tour through the expanding world of optimization. The
NEOS server is responsible for bridging the gap between users and the optimization
software they seek. More specifically, the NEOS server will accept optimization problems
over the Internet and return a solution to the user either interactively or by e-mail.
This paper discusses the current implementation of the server. |
|
|
| G. D. Pusch, C. Bischof, and A. Carle, "On Automatic
Differentiation of Codes with COMPLEX Arithmetic with Respect to Real Variables," ANL/MCS-TM-188 (June
1995). |
We explore what it means to apply automatic differentiation
with respect to a set of real variables to codes containing complex arithmetic. That is,
both dependent and independent variables with respect to differentiation are real
variables, but in order to exploit features of complex mathematics, part of the code is
expressed by employing complex arithmetic. We investigate how one can apply automatic
differentiation to complex variables if one exploits the homomorphism of the complex
numbers C onto R**2. It turns out that, by and large, the usual rules of differentiation
apply, but subtle differences in special cases arise for sqrt(), abs(), and the power
operator. |
|
|
|
|
|
|
|
| Christian H. Bischof, Alan Carle, Peyvand M. Khademi, and
Gordon Pusch, "Automatic Differentiation: Obtaining Fast and Reliable
Derivatives---Fast," Preprint MCS-P484-1194. |
In this paper, we introduce automatic differentiation as a
method for computing derivatives of large computer codes. After a brief discussion of
methods of differentiating codes, we review automatic differentiation and introduce the
ADIFOR automatic differentiation tool. We highlight some applications of ADIFOR to large
industrial and scientific codes, and discuss the effectiveness and performance of our
approach. Finally, we discuss sparsity in automatic differentiation and introduce the
SparsLinC library. |
|
|
| Ali Bouaricha and Jorge J. More, "Impact of Partial
Separability on Large-Scale Optimization," J. Comput. Optimization Appl. 7,
no. 1 (1997), pp. 27-40. Also Preprint MCS-P487-0195,
November 1995. |
ELSO is an environment for the solution of large-scale
optimization problems. With ELSO the user is required to provide only code for the
evaluation of a partially separable function. ELSO exploits the partial separability
structure of the function to compute the gradient efficiently using automatic
differentiation. We demonstrate ELSO's efficiency by comparing the various options
available in ELSO. Our conclusion is that the hybrid option in ELSO provides performance
comparable to the hand-coded option, while having the significant advantage of not
requiring a hand-coded gradient or the sparsity pattern of the partially separable
function. In our test problems, which have carefully coded gradients, the computing time
for the hybrid AD option is within a factor of two of the hand-coded option. |
|
|
| C. Bischof, A. Bouaricha, P. M. Khademi, and J. J. More',
"Computing Gradients in Large-Scale Optimization Using Automatic
Differentiation," INFORMS J. on Computing 9, no. 2 (Spring 1997). Also
Preprint MCS-P488-0195.
|
The accurate and efficient computation of gradients for
partially separable functions is central to the solution of large-scale optimization
problems, since these functions are ubiquitous in large-scale problems. We describe two
approaches for computing gradients of partially separable functions via automatic
differentiation. In our experiments we employ the ADIFOR (Automatic Differentiation of
Fortran) tool and the SparsLinC (Sparse Linear Combination) library. We use applications
from the MINPACK-2 test problem collection to compare the numerical reliability and
computational efficiency of these approaches with hand-coded derivatives and
approximations based on differences of function values. Our conclusion is that automatic
differentiation is the method of choice, providing code for the efficient computation of
the gradient without the need for tedious hand coding. |
|
|
| M. K. Kwong and B. Lin, "W-Transform Method for
Feature-oriented Multiresolution Image Retrieval," Preprint MCS-P489-0195. |
We present a local-feature-oriented image indexing and
retrieval method based on Kwong and Tang's W-transform. Multiresolution histogram
comparison is an effective method for content-based image indexing and retrieval. However,
most recent approaches perform multiresolution analysis for whole images but do not
exploit the local features present in the images. Since W_transform is featured by its
ability to handle images of arbitrary size, with no periodicity assumptions, it provides a
natural tool for analyzing local image features and building indexing systems based on
such features. In our approach, the histograms of the local features of images are used in
the indexing system. The system not only can retrieve images that are similar or identical
to the query images but also can retrieve images that contain features specified in the
query images, even if the retrieved images as a whole might be very different from the
query images. The local-feature-oriented method also provides a speed advantage over the
global multiresolution histogram comparison method. The feature-oriented approach is
expected to be applicable in managing large-scale image systems such as video databases
and medical image databases. |
|
|
| I. Foster and C. Kesselman, "Language Constructs and
Runtime Systems for Compositional Parallel Programming," Preprint MCS-P490-0195. |
In task-parallel programs, diverse activities can take place
concurrently, and communication and synchronization patterns are complex and not easily
predictable. Previous work has identified compositionality as an important design
principle for task-parallel programs. In this paper, we discuss alternative approaches to
the realization of this principle. We first provide a review and critical analysis of
Strand, an early compositional programming language. We examine the strengths of the
Strand approach and also its weaknesses, which we attribute primarily to the use of a
specialized language. Then, we present an alternative programming language framework that
overcomes these weaknesses. This framework uses simple extensions to existing sequential
languages (C++ and Fortran) and a common runtime system to provide a basis for the
construction of large, task-parallel programs. We also discuss the run-time system
techniques required to support these languages on parallel and distributed computer
systems. |
|
|
| P. Hovland, C. Bischof, D. Spiegelman, and M. Casella,
"Efficient Derivative Codes through Automatic Differentiation and Interface
Contraction: An Application in Biostatistics," SIAM J. Sci. Comput. 18(4)
(1997), pp. 1056-1066. Also Preprint MCS-P491-0195. |
Developing code for computing the first- and higher-order
derivatives of a function by hand can be very time-consuming and is prone to errors.
Automatic differentiation has proven capable of producing derivative codes with very
little effort on the part of the user. Automatic differentiation avoids the truncation
errors characteristic of divided-difference approximations. However, the derivative code
produced by automatic differentiation can be significantly less efficient than one
produced by hand. This shortcoming may be overcome by utilizing insight into the
high-level structure of a computation. This paper focuses on how to take advantage of the
fact that the number of variables passed between subroutines frequently is small compared
with the number of the variables with respect to which we wish to differentiate. Such an
"interface contraction," coupled with the associativity of the chain rule for
differentiation, allows us to apply automatic differentiation in a more judicious fashion,
resulting in much more efficient code for the computation of derivatives. A case study
involving the ADIFOR (Automatic Differentiation of Fortran) tool and a program for
maximizing a logistic-normal likelihood function developed from a problem in nutritional
epidemiology is examined, and performance figures are presented. |
|
|
| J. N. Lyness and S. Joe, "Triangular Canonical Forms for
Lattice Rules of Prime-Power Order," Preprint MCS-P492-0495. |
In this paper we develop a theory of t-cycle D-Z
representations for s-dimensional lattice rules of prime-power order. Of
particular interest are canonical forms that, by definition, have a D-matrix
consisting of the nontrivial invariants. Among these is a family of triangular
forms, which, besides being canonical, have the defining property that their Z-matrix
is a column-permuted version of a unit upper-triangular matrix. Triangular forms may be
obtained constructively by using sequences of elementary transformations based on
elementary matrix algebra. Our main result is to define a unique canonical form for
prime-power rules. This ultratriangular form is a triangular form, is easy to
recognize, and may be derived in a straightforward manner. |
|
|
| W. McCune and R. Padmanabhan, "Single Identities for
Lattice Theory and for Weakly Associative Lattices," Algebra Universalis 36
(1996), pp. 436-449. Also Preprint MCS-P493-0395. |
We present a single identity for the variety of all lattices
that is much simpler than those previously known to us. We also show that the variety of
weakly associative lattices is one-based, and we present a generalized one-based theorem
for subvarieties of weakly associative lattices that can be defined with absorption laws.
The automated theorem-proving program Otter was used in a substantial way to obtain the
results. |
|
|
| A. J. Conley and H. B. Keller, "Wavy Taylor Vortices in
Plane Couette Flow," Preprint MCS-P495-0195. |
Path-following techniques applied to a spectral approximation
of the solution of the Navier-Stokes Equations have revealed the existence of a new class
of solutions to the plane Couette flow problem. |
|
|
| M. Huang, M. Papka, T. DeFanti, D. Levine, L. Turner, and L.
Kettunen, "Virtual Reality Visualization of Accelerator Magnets," Preprint MCS-P496-0295. |
We describe the use of the CAVE virtual reality visualization
environment as an aid to the design of accelerator magnets. We have modeled an elliptical
multipole wiggler magnet being designed for use at the Advanced Photon Source at Argonne
National Laboratory. The CAVE environment allows us to explore and interact with the 3-D
visualization of the magnet. Capabilities include changing the number of periods of the
magnet displayed, changing the icons used for displaying the magnetic field, and changing
the current in the electromagnet and observing the effect on the magnetic field and
particle beam trajectory through the field. |
|
|
| E. T. Olsen and B. Lin, "A Wavelet Phase Filter for
Emission Tomography," Preprint MCS-P497-0295. |
The presence of a high level of noise is a characteristic in
some tomographic imaging techniques such as positron emission tomography. Wavelet methods
can smooth out noise while preserving significant features of images. Mallat et al.
proposed a wavelet-based denoising scheme exploiting wavelet modulus maxima, but the
scheme is sensitive to noise. In this study, we explore the properties of wavelet
phase, with a focus on reconstruction of emission tomography images. Specifically, we
show that the wavelet phase of regular Poisson noise under a Haar-type wavelet transform
converges in distribution to a random variable uniformly distributed on [0, 2 pi). We then
propose three wavelet-phase-based denoising schemes which exploit this property: edge
tracking, local phase variance thresholding, and scale phase variation thresholding.
Numerical results are also presented. The numerical experiments indicate that
wavelet-phase techniques show promise for wavelet based denoising methods. |
|
|
| D. A. Lifka, "The ANL/IBM SP Scheduling System,"
Preprint MCS-P498-0395.
|
During the past five years scientists discovered that modern
UNIX workstations connected with ethernet and fibre networks could provide enough
computational performance to compete with the supercomputers of the day. As this concept
became increasingly popular, the need for distributed queuing and scheduling systems
became apparent. Systems such as DQS from Florida State were developed and worked very
well. Today, supercomputers, like Argonne National Laboratory's IBM SP system, can provide
more CPU and networking speed than can be obtained from these networks of workstations.
These modern supercomputers look like clusters of workstations, however, so developers
felt that the scheduling systems that were previously used on clusters of workstations
should still apply. After trying to apply some of these scheduling systems to Argonne's SP
environment, it became obvious that these two computer environments have very different
scheduling needs. Recognizing this need and realizing that no one has addressed it, I
developed a new scheduling system. The approach taken in creating this system was unique
in that user input and interaction were encouraged throughout the development process.
Thus, a scheduler was built that actually "worked" the way the users wanted it
to work. |
|
|
| William Gropp, "An Introduction to Performance Debugging
for Parallel Computers," Preprint MCS-P500-0295, April
1995. |
Programming parallel computers for performance is a difficult
task that requires careful attention to both single-node performance and data exchange
between processors. This paper discusses some of the sources of poor performance, ways to
identify them in an application, and a few ways to address these issues. |
|
|
| Shannon Bradshaw, Thomas Canfield, John Kokinis, Terrence
Disz, "An Interactive Virtual Environment for Finite Element Analysis," Proc.
High-Performance Computing '95, Phoenix, Arizona, April 12-13, 1995. Also Preprint MCS-P501-0395. |
Virtual environments (VE) provide a powerful human-computer
interface that opens the door to exciting new methods of interaction with high-performance
computing applications in several areas of research. We are interested in the use of
virtual environments as a user interface to real-time simulations used in rapid
prototyping procedures. Consequently, we are developing methods for coupling finite
element models of complex mechanical systems with a VE interface for real-time
interaction. |
|
|
| Terrence Disz, Michael Papka, Rick Stevens, Michael
Pellegrino, and Valerie Taylor, "Virtual Reality Visualization of Parallel Molecular
Dynamics Simulation," Proc. High-Performance Computing '95, Phoenix, Arizona, April
12-13, 1995. Also preprint MCS-P502-0395. |
When performing communications mapping experiments for
massively parallel processors, it is important to be able to visualize the mappings and
resulting communications. In a molecular dynamics model, visualization of the atom to atom
interaction and the processor mappings provides insight into the effectiveness of the
communications algorithms. The basic quantities available for visualization in a model of
this type are the number of molecules per unit volume, the mass, and velocity of each
molecule. The computational information available for visualization is the atom to atom
interaction within each time step, the atom to processor mapping, and the energy rescaling
events. We use the CAVE (CAVE Automatic Virtual Environment) to provide interactive,
immersive visualization experiences. |
|
|
| Valerie E. Taylor, Rick Stevens, and Thomas Canfield,
"Performance Models of Interactive, Immersive Visualization for Scientific
Applications," Proc. High Performance for Computer Graphics and Visualization
Conference, July 3-4, 1995, Swansea, U.K. Also Preprint MCS-P503-0395. |
In this paper we develop a performance model for analyzing
the end-to-end lag in a combined supercomputer/virtual environment. We first present a
general model and then use this model to analyze the end-to-end lag of a finite element
simulation that a user interacts in real time via the CAVE Automatic Virtual Environment.
The simulation is executed on an IBM SP-2 parallel supercomputer. Our model decouples the
viewpoint lag (not involving the simulation) from the interaction lag (using the results
of the simulations). This model allows one to understand the relative contributions to
end-to-end lag of the following components: rendering, tracking, network latency,
simulation time, and various types of synchronization lags. The results of the study
indicate that the rendering and network latency are the major contributors of the
end-to-end lag. |
|
|
| S. A. Gabriel and A. S. Kydes, "A Nonlinear
Complementarity Approach for the National Energy Modeling System," MCS-P504-0395. |
The National Energy Modeling System (NEMS) is a large-scale
mathematical model that computes equilibrium fuel prices and quantities in the U.S. energy
sector. At present, to generate these equilibrium values, NEMS sequentially solves a
collection of linear programs and nonlinear equations. The NEMS solution procedure then
incorporates the solutions of these linear programs and nonlinear equations in a nonlinear
Gauss-Seidel approach. We describe how the current version of NEMS can be formulated as a
particular nonlinear complementarity problem (NCP), thereby possibly avoiding current
convergence problems. In addition, we show that the NCP format is equally valid for a more
general form of NEMS. We also describe several promising approaches for solving the NCP
form of NEMS based on recent Newton type methods for general NCPs. These approaches share
the feature of needing to solve their direction-finding subproblems only approximately.
Hence, they can effectively exploit the sparsity inherent in the NEMS NCP. |
|
|
| J. More' and Z. Wu, "Global Continuation for Distance
Geometry Problems," SIAM J. Optim. 7, no. 3 (August 1997), pp.
814-836. Also Preprint MCS-P505-0395. |
Distance geometry problems arise in the interpretation of NMR
data and in the determination of protein structure. We formulate the distance geometry
problem as a global minimization problem with special structure, and show that global
smoothing techniques and a continuation approach for global optimization can be used to
determine solutions of distance geometry problems with a nearly 100% probability of
success. |
|
|
| Man Kam Kwong, P. T. Peter Tang, and Biquan Lin, "A
Real-Time MPEG Software Decoder Using a Portable Message-Passing Library," Preprint MCS-P506-0395. |
We present a real-time MPEG software decoder that uses
message-passing libraries such as MPL, p4, and MPI. the parallel MPEG decoder currently
runs on the IBM SP system but can be easily ported to other parallel machines. This paper
discusses our parallel MPEG decoding algorithm as well as the parallel programming
environment under which it uses. Several technical issues are discussed, including
balancing of decoding speed, memory limitation, I/O capacities, and optimization of MPEG
decoding components. This project shows that a real-time portable software MPEG decoder is
feasible in a general-purpose parallel machine. |
|
|
| Christian H. Bischof, "On the Automatic Differentiation
of Computer Programs," Preprint MCS-P507-0495. |
Automatic differentiation (AD) is a methodology for
developing sensitivity-enhanced versions of arbitrary computer programs. In this paper, we
provide some background information on AD and address some frequently asked questions. We
introduce the ADIFOR and ADIC tools for the automatic differentiation of Fortran 77 and
ANSI-C programs, respectively, and give an example of applying ADIFOR in the context of
the optimization of multibody systems. |
|
|
| Steven A. Gabriel, "An NE/SQP Method for the Bounded
Nonlinear Complementarity Problem," Preprint MCS-P508-0695. |
NE/SQP is a recent algorithm that has proven quite effective
for solving the pure and mixed forms of the nonlinear complementarity problem (NCP).
NE/SQP is robust in the sense that its direction-finding subproblems are always solvable;
in addition, the convergence rate of this method is Q-quadratic. In this paper we consider
a generalized version of NE/SQP proposed by Pang and Qi that is suitable for the bounded
NCP. We extend their work by demonstrating a stronger convergence result and then test a
proposed method on several numerical problems. |
|
|
| I. T. Foster, B. Toonen, and P. H. Worley, "Performance
of Massively Parallel Computers for Spectral Atmospheric Models," J. Atmospheric
and Oceanic Technology 13, no. 5 (October 1996), pp. 1031-1045. Also Preprint MCS-P509-0495. |
Massively parallel processing (MPP) computer systems use
high-speed interconnection networks to link hundreds or thousands of RISC microprocessors.
With each microprocessor having a peak performance of 100 or more Mflops/sec, there is at
least the possibility of achieving very high performance. However, the question of exactly
how to achieve this performance remains unanswered. MPP systems and vector multiprocessors
require very different coding styles. Different MPP systems have widely varying
architectures and performance characteristics. For most problems, a range of different
parallel algorithms is possible, again with varying performance characteristics. In this
paper, we provide a detailed, fair evaluation of MPP performance for a weather and climate
modeling application. Using a specially designed spectral transform code, we study
performance on three different MPP systems: Intel Paragon, IBM SP2, and Cray T3D. We take
great care to control for performance differences due to varying algorithmic
characteristics. The results yield insights into MPP performance characteristics, parallel
spectral transform algorithms, and coding style for MPP systems. We conclude that it is
possible to construct parallel models that achieve multi-Gflops/sec performance on a range
of MPPs, if the models are constructed to allow runtime selection of appropriate
algorithms. |
|
|
| M. Garbey and H. G. Kaper, "Heterogeneous Domain
Decomposition for Singularly Perturbed Elliptic Boundary Value Problems," SIAM J.
Numer. Anal. 34, no. 4 (August 1997), pp. 1513-1544. Also Preprint MCS-P510-0495. |
A heterogeneous domain-decomposition method is presented for
the numerical solution of singularly perturbed elliptic boundary value problems. The
method, which is parallelizable at various levels, uses several ideas of asymptotic
analysis. The subdomains match the domains of validity of the local ("inner" and
"outer") asymptotic expansions, and cut-off functions are used to match
solutions in neighboring subdomains. The positions of the interfaces, as well as the mesh
widths, depend on the small parameter, e. On the subdomains, iterative solution techniques
are used, which may vary from one subdomain to another. The global convergence rate
depends on e; it generally increases like some power of (log(e**{-1}))**{-1} as e
\downarrow0. The method is illustrated on several two-dimensional singular perturbation
problems. |
|
|
| A. J. Conley, "Centrifugal Destabilization and
Restabilization of Plane Shear Flows," Preprint MCS-P511-0395. |
The flow of an incompressible viscous fluid between parallel
plates becomes unstable when the plates are tumbled. As the tumbling rate increases, the
flow restabilizes. This phenomenon is elucidated by path-following techniques. The
solution of the Navier-Stokes equations is approximated by spectral techniques. The linear
stability of these solutions is studied. |
|
|
| Christian H. Bischof, William T. Jones, Andrew Mauer, and
Jamshid Samareh-Abolhassani, "Experiences with the Application of the ADIC Automatic
Differentiation Tool to the CSCMDO 3-D Volume Grid Generation Code," 34th AIAA
Aerospace Sciences Meeting and Exhibit, Jan. 15-18, 1996, Reno, Nev., AIAA 96-0716. Also
Preprint MCS-P512-0595,
October 1995. |
Automatic differentiation (AD) is a methodology for
developing reliable sensitivity-enhanced versions of arbitrary computer programs with
little human effort. It can vastly accelerate the use of advanced simulation codes in
multidisciplinary design optimization, since the time for generating and verifying
derivative codes is greatly reduced. In this paper, we report on the application of the
recently developed ADIC automatic differentiation tool for ANSI C programs to the CSCMDO
multiblock three-dimensional volume grid generator. The ADIC-generated code can easily be
interfaced with Fortran derivative codes generated with the ADIFOR AD tool for FORTRAN 77
programs, thus providing efficient sensitivity-enhancement techniques for multilanguage,
multidiscipline problems. |
|
|
| Robert K. Gjertsen, Jr., Mark T. Jones, and Paul E.
Plassmann, "Parallel Heuristics for Improved, Balanced Graph Colorings,"
Preprint MCS-P513-0595.
|
The computation of good, balanced graph colorings is an
essential part of many algorithms required in scientific and engineering applications.
Motivated by an effective sequential heuristic, we introduce a new parallel heuristic,
PLF, and show that this heuristic has the same expected runtime under the P-RAM
computational model as the scalable coloring heuristic introduced by Jones and Plassmann
(JP). We present experimental results performed on the Intel DELTA that demonstrate that
this new heuristic consistently generates better colorings and requires only slightly more
time than the JP heuristic. In the second part of the paper we introduce two new parallel
color-balancing heuristics, PDR(k) and PLF(k). We show that these heuristics have the
desirable property that they do not increase the number of colors used by an initial
coloring during the balancing process. We present experimental results that show that
these heuristics are very effective in obtaining balanced colorings and, in addition,
exhibit scalable performance. |
|
|
| Juan Mario Restrepo, Gary K. Leaf, and Andreas Griewank,
"Circumventing Storage Limitations in Variational Data Assimilation Studies," SIAM
J. on Scientific Computing 19 (1998), pp. 1586-1605. Also
Preprint MCS-P515-0595.
|
The aim of data assimilation is to infer the state of a
system from a geophysical model and possibly incomplete or nonuniformly distributed
spatiotemporal observational data. Used extensively in engineering control theory
applications, data assimilation has relatively recently been introduced into
meteorological forecasting, natural-resource recovery modeling, and climate dynamics.
Variational data assimilation is a promising assimilation technique in which it is assumed
that the state of the system is an extrema of a carefully chosen objective function.
Provided that an adjoint model is available, the required model gradients can be computed
by integrating the model forward and its adjoint backward. The gradients are then used to
extremize the cost function with a suitable iterative or conjugate gradient solver. The
problem we address in this study is the explosive growth in both on-line computer memory
and remote storage requirements of large-scale assimilation studies. This imposes a severe
physical limitation on the size of assimilation studies, even on the largest computers. By
using a recursive strategy, a schedule can be constructed that enables the forward/adjoint
model runs to be performed in such a way that storage requirements can be traded for
longer computational times. This generally applicable strategy enables data assimilation
studies on significantly larger domains than would otherwise be possible given particular
hardware constraints. We show that this tradeoff is indeed viable and that when the
schedule is optimized, the storage and computational times grow at most logarithmically. |
|
|
| J. N. Lyness and L. M. Delves, "On the Implementation of
a Modified Sag-Szekeres Quadrature Method," Preprint MCS-P516-0595. |
We describe a modified Sag-Szekeres multidimensional
quadrature algorithm and discuss its implementation as a general-purpose library procedure
on serial and parallel architectures. Examples illustrate its effectiveness for both
smooth and singular integrands. |
|
|
| W. McCune and R. Padmanabhan, "An Equational
Characterization of the Conic Construction on Cubic Curves," Preprint MCS-P517-0595. |
An n-ary Steiner law f(x_1,x_2,...,x_n) on
a projective curve \Gamma over an algebraically closed field k is a totally
symmetric n-ary morphism f from \Gamma^n to \Gamma satisfying the
universal identity f(x_1,x_2,...,x_{n-1},f(x_1,x_2,...,x_n)) = x_n . An element e
in \Gamma is called an idempotent for f if f(e,e,...,e) = e. The binary
morphism x*y of the classical chord-tangent construction on a nonsingular cubic
curve is an example of a binary Steiner law on the curve, and the idempotents of * are
precisely the inflection points of the curve. In this paper, we prove that if f
and g are two 5-ary Steiner laws on an elliptic curve \Gamma sharing a common
idempotent, then f=g. We use a new rule of inference rule =(gL)=>,
extracted from a powerful local-to-global principle in algebraic geometry. This rule is
implemented in the theorem-proving program OTTER. Then we use OTTER to automatically prove
the uniqueness of the 5-ary Steiner law on an elliptic curve. Very much like the binary
case, this theorem provides an algebraic characterization of a geometric construction
process involving conics and cubics. The well-known theorem of the uniqueness of the group
law on such a curve is shown to be a consequence of this result. |
|
|
| Tomasz Bieleck, Li M. Song, Stephen S. T. Yau, and Man Kam
Kwong, "Random Wavelet Transforms, Algebraic Geometric Coding, and Their Applications
in Signal Compression and De-Noising," Preprint MCS-P518-0595. |
The concepts of random wavelet transforms and discrete random
wavelet transforms are introduced. It is shown that these transforms can lead to
simultaneous compression and de-noising of signals that have been corrupted with
fractional noises. Potential applications of algebraic geometric coding theory to encode
the ensuing data are also discussed. |
|
|
| Christian H. Bischof, Peyvand M. Khademi, Ali Bouaricha, and
Alan Carle, "Efficient Computation of Gradients and Jacobians by Transparent
Exploitation of Sparsity in Automatic Differentiation," Optimization Methods and
Software 7 (1996), pp. 1-39. Also Preprint MCS-P519-0595. |
Automatic differentiation (AD) is a technique that augments
computer codes with statements for the computation of derivatives. The computational
workhorse of AD-generated codes for first-order derivatives is the linear combination of
vectors. For many large-scale problems, the vectors involved in this operation are
inherently sparse. If the underlying function is a partially separable one (e.g., if its
Hessian is sparse), many of the intermediate gradient vectors computed by AD will also be
sparse, even though the final gradient is likely to be dense. For large Jacobians
computations, every intermediate derivative vector is usually at least as sparse as the
least sparse row of the final Jacobian. In this paper, we show that transparent
exploitation of the sparsity inherent in derivative computation can result in dramatic
gains in runtime and memory savings. For a set of gradient problems exhibiting implicit
sparsity, we report on the runtime and memory requirements of computing the gradients with
the ADIFOR (Automatic DIfferentiation of FORtran) tool, both with and without employing
the SparsLinC (Sparse Linear Combinations) library, and show
that SparsLinC can reduce runtime and memory costs by orders of magnitude. We also compute
sparse Jacobians using the SparsLinC-based approach---in the process, automatically
detecting the sparsity structure of the Jacobian---and show that these Jacobian results
compare favorably with those of previous techniques that require a priori knowledge of the
sparsity structure of the Jacobian. |
|
|
| Jorge J. More' and Zhijun Wu, "Epsilon-Optimal Solutions
to Distance Geometry Problems via Global Continuation," Preprint MCS-P520-0595. |
We show that a continuation approach to global optimization
with global smoothing techniques can be used to obtain e-optimal solutions to distance
geometry problems. We show that determining an e-optimal solution is still an NP-hard
problem when e is small. A discrete form of the Gaussian transform is proposed based on
the Hermite form of Gaussian quadrature. We show that the modified transform can be used
whenever the transformed functions cannot be computed analytically. Our numerical results
show that the discrete Gauss transform can be used to obtain e-optimal solutions for
general distance geometry problems, and in particular, to determine the three-dimensional
structure of protein fragments. |
|
|
| Paul Hovland, Steve Altus, Christian Bischof, and Ilan Kroo,
"Using Automatic Differentiation with the Quasi-Procedural Method for
Multidisciplinary Design Optimization," in Proc. 34th AIAA Aerospace Sciences
Meeting, Reno, Nevada, January 15-19, 1996, AIAA 96-0090. Also Preprint MCS-P521-0595. |
As computers have become increasingly powerful, the field of
design optimization has moved toward higher fidelity models in the early stages of design.
One way in which this movement is manifested is in the increasing popularity of
multidisciplinary design optimization (MDO). Because the models used in MDO are large and
complicated, a modular design is desirable. There are many design parameters to optimize,
and the robustness of the method requires that derivatives be computed accurately and
efficiently. This paper describes how the quasi-procedural program architecture developed
by Takai and Kroo and the technique of automatic differentiation can be combined to
address these needs effectively. The two techniques are explained, the manner in which
they were integrated into a single framework is described, and the result of using this
framework for an optimization problem in airplane design is presented. |
|
|
| Larry Wos, "The Power of Combining Resonance with
Heat," J. of Automated Reasoning (to appear). Also Preprint MCS-P522-0695, August
1995. |
In this article, I present experimental evidence of the value
of combining two strategies each of which has proved powerful in various contexts. The resonance
strategy gives preference (for directing a program's reasoning) to equations or
formulas that have the same shape (ignoring variables) as one of the patterns supplied by
the researcher to be used as a resonator. The hot list strategy rearranges the
order in which conclusions are drawn, the rearranging caused by immediately visiting and,
depending on the value of the heat parameter, even immediately revisiting a set of input
statements chosen by the researcher; the chosen statements are used to complete
applications of inference rules rather than to initiate them. Combining these two
strategies often enables an automated reasoning program to attack deep questions and hard
problems with far more effectiveness than using either alone. The use of this combination
in the context of cursory proof checking produced most unexpected and satisfying results,
as I show here. I present the material (including commentary) in the spirit of excerpts
from an experimenter's notebook, thus meeting the frequent request to illustrate how a
researcher can make wise choices from among the numerous options offered by McCune's
automated reasoning program OTTER. I include challenges and topics for research and, to
aid the researcher, in the Appendix a sample input file and a number of intriguing proofs.
|
|
|
| William D. Reynolds, Jr., "Image Compression Using the
W-Transform," Preprint MCS-P524-0695,
October 1995. |
We present the W-transform for a multiresolution signal
decomposition. One of the differences between the wavelet transform and W-transform is
that the W-transform leads to a nonorthogonal signal decomposition. Another difference
between the two is the manner in which the W-transform handles the endpoints (boundaries)
of the signal. This approach does not restrict the length of the signal to be a power of
two. Furthermore, it does not call for the extension of the signal; thus, the W-transform
is a convenient tool for image compression. We present the basic theory behind the
W-transform and include experimental simulations to demonstrate its capabilities. |
|
|
| S. Zafar, Y-Q. Zhang, and B. Jabbari, "Block-Classified
Motion Compensation Scheme for Digital Video," Preprint MCS-P525-0695,
November 1995. |
A novel scheme for block-based motion compensation is
introduced in which a block is classified according to the energy that is directly related
to the motion activity it represents. This classification allows more flexibility in
controlling the bit rate and the signal-to-noise ratio and results in a reduction in
motion search complexity. The method introduced is not dependent on the particular type of
motion search algorithm implemented and can thus be used with any method assuming that the
underlying matching criteria used is minimum absolute difference. It has been shown that
the method is superior to a simple motion compensation algorithm where all blocks are
motion compensated regardless of the energy resulting after the displaced difference. |
|
|
| J. Chen, M. Griem, J. Aarsvold, C.-T. Chen, T. Disz, R.
Hudson, M. K. Kwong, and B. Lin, "A High-Performance Image Analysis and Visualization
Engine for Three-dimensional Microscopy Imaging," Preprint MCS-P526-0695. |
Three-dimensional microscopy imaging is important for
understanding complex biological assemblies. Computers and digital image acquisition
systems have made possible the three-dimensional reconstruction of images obtained from
optical microscopes. However, since processing such images requires tremendous CPU,
storage, and I/O capacities, a high-performance computing facility is necessary in order
to achieve real-time performance. But it is uneconomical (if not impossible) for a medical
center or a biological research institution to operate a large computer system at the
present time. The advent of high-speed networks such as asynchronous transfer mode (ATM)
enables biologists and radiologists to use remote high-performance computers to process
such images and visualize reconstructed 3D images in real time. In this paper, we present
such a prototype system that integrates microscopy image acquisition, parallel deblurring,
3D image visualization, and high-speed networks. |
|
|
| Jacqueline Fleckinger-Pelle' and Hans G. Kaper, "Gauges
for the Ginzburg-Landau Equations of Superconductivity," ZAMM - Z. angew. Math.
Mech. 76 (1996) S2, pp. 345-348. Also Preprint MCS-P527-0795. |
This note discusses some gauge choices for the time-dependent
Ginzburg-Landau equations of superconductivity. The equations model the state of a
superconducting sample in a magnetic field near the critical temperature. Any two
solutions related through a "gauge transformation" describe the same state and
are physically indistinguishable. This "gauge invariance" can be exploited for
analytical and numerical purposes. |
|
|
| Man Kam Kwong, "Orthogonally Compensated W-Multiresolution
Analysis and Signal Processing," Preprint MCS-P528-0795. |
The concept of a W-matrix is used to give an
elementary interpretation of a biorthogonal wavelet decomposition of signals. We also give
a method to modify the decomposition to give an orthogonal projection on the space spanned
by the scaling vectors. Roughly speaking, our treatment is a finite-length analog of the
well-known theory of multiresolution analysis of Meyer and Mallat. Our approach differs in
that it deals directly with the discrete case, it takes care of the boundary elements
without explicit padding, and it uses a notion similar to that of semiorthogonality
introduced by Chui. Our algorithm has flexibility in the choice of filter coefficients.
The decomposition, orthogonalization, and restoration algorithms are computationally fast.
|
|
|
| Rajeev Thakur and Alok Choudhary, "An Extended Two-Phase
Method for Accessing Sections of Out-of-Core Arrays," Preprint MCS-P530-0595, Sept.
1995. |
A number of applications on parallel computers deal with very
large data sets that cannot fit in main memory. In such applications, data must be stored
in files on disks and fetched into memory during program execution. Parallel programs with
large out-of-core arrays stored in files must read/write smaller sections of the arrays
from/to files. In this paper, we describe a method for accessing sections of out-of-core
arrays efficiently. Our method, the extended two-phase method, uses collective I/O:
Processors cooperate to combine several I/O requests into fewer larger granularity
requests, reorder requests so that the file is accessed in proper sequence, and eliminate
simultaneous I/O requests for the same data. In addition, the I/O workload is divided
among processors dynamically, depending on the access requests. We present performance
results obtained from two real out-of-core parallel applications---matrix multiplication
and a Laplace's equation solver---and several synthetic access patterns, all on the Intel
Touchstone Delta. These results indicate that the extended two-phase method significantly
outperformed a direct (non-collective) method for accessing out-of-core array sections. |
|
|
| Christian H. Bischof, Gordon D. Pusch, and Ralf Knoesel,
"Sensitivity Analysis of the MM5 Weather Model Using Automatic Differentiation,"
Computers in Physics 10, no. 6 (1996), pp. 605-612. Also Preprint MCS-P532-0895,
December 1995. |
We present a general method for using automatic
differentiation to facilitate model sensitivity analysis. Automatic differentiation
techniques augment, in a completely mechanical fashion, an existing code such that it also
simultaneously and efficiently computes derivatives. Our method allows the sensitivities
of the code's outputs to its parameters and inputs to be determined with minimal human
effort by exploiting the relationship between differentiation and formal perturbation
theory. Employing this method, we performed a sensitivity study of the MM5 code, a
mesoscale weather model jointly developed by Penn State University and the National Center
for Atmospheric Research, comprising roughly 40,000 lines of Fortran 77 code. Our results
show that AD-computed sensitivities exhibit superior accuracy compared with divided
differences approximations computed from finite-amplitude perturbations, while consuming
comparable or less CPU time and less human labor. We also comment on a numerically induced
precursor wave that would almost certainly have been undetectable if one used a divided
difference method. |
|
|
| Rajeev Thakur, Ewing Lusk, and William Gropp, "I/O
Characterization of a Portable Astrophysics Application on the IBM SP and Intel
Paragon," Preprint MCS-P534-0895, August
1995. |
Many large-scale applications on parallel machines are
bottlenecked by the I/O performance rather than the CPU or communication performance of
the system. To improve the I/O performance, it is first necessary for system designers to
understand the I/O requirements of various applications. This paper presents the results
of a study of the I/O characteristics and performance of a real, I/O-intensive, portable,
parallel application in astrophysics, on two different parallel machines---the IBM SP and
the Intel Paragon. We instrumented the source code to record all I/O activity and analyzed
the resulting trace files. Our results show that, for this application, the I/O consists
of fairly large writes, and writing data to files is faster on the Paragon, whereas
opening and closing files are faster on the SP. We also discuss how the I/O performance of
this application could be improved; particularly, we believe that this application would
benefit from using collective I/O. |
|
|
| S. Zafar, Y-Q. Zhang, and B. Jabbari, "Block-Classified
Bidirectional Motion Compensation Scheme for Wavelet-Decomposed Digital Video,"
Preprint MCS-P535-0895,
November 1995. |
In this paper we introduce a block-classified bidirectional
motion compensation scheme for our previously developed wavelet-based video codec, where
multiresolution motion estimation is performed in the wavelet domain. The frame
classification structure described in this paper is similar to that used in the MPEG
standard. Specifically, the I-frames are intraframe coded, the P-frames are interpolated
from a previous I- or a P-frame, and the B-frames are bidrectional interpolated frames. We
apply this frame classification structure to the wavelet domain with variable block sizes
and multiresolution representation. We use a symmetric bidirectional scheme for the
B-frames and classify the motion blocks as intraframe, compensated either from the
preceding or the following frame, or bidrectional (i.e., compensated based on which type
yields the minimum energy). We also introduce the concept of F-frames, which are analogous
to P-frames but are predicted from the following frame only. This improves the overall
quality of the reconstruction in a group of pictures (GOP) but at the expense of extra
buffering. We also study the effect of quantization of the I-frames on the reconstruction
of a GOP, and we provide intuitive explanation for the results. |
|
|
| John Drake, Ian Foster, John Michalakes, Brian Toonen, and
Patrick Worley, "Design and Performance of a Scalable Parallel Community Climate
Model," Preprint MCS-P536-0895,
September 1995. |
We describe the design of a parallel global atmospheric
circulation model, PCCM2. This parallel model is functionally equivalent to the National
Center for Atmospheric Research's Community Climate Model, CCM2, but is structured to
exploit distributed memory multicomputers. PCCM2 incorporates parallel spectral transform,
semi-Lagrangian transport, and load balancing algorithms. We represent detailed
performance results on the IBM SP2 and Intel Paragon. These results provide insights into
the scalability of the individual parallel algorithms and of the parallel model as a
whole. |
|
|
| D. W. Braun, G. W. Crabtree, H. G. Kaper, A. E. Koshelev, G.
K. Leaf, D. M. Levine, and V. M. Vinokur, "The Structure of a Moving Vortex
Lattice," Physical Review Letters 76, no. 5 (January 1996), pp. 831-834.
Also Preprint MCS-P537-0895,
December 1995. |
Numerical solutions of the time-dependent Ginzburg-Landau
equations show a new mechanism for plastic motion of a driven vortex lattice in a clean
superconductor. The mechanism, which involves the creation of a defect superstructure, is
intrinsic to the moving vortex lattice and is independent of bulk pinning. Other
structural features found in the solutions include a re-orientation of the vortex lattice
and a gradual healing of lattice defects under the influence of a transport current. |
|
|
| K. Forsman, W. Gropp, L. Kettunen, D. Levine, and J. Salonen,
"Solution of Dense Systems of Linear Equations Arising from Integral Equation
Formulations," Preprint MCS-P538-0895. |
This paper discusses efficient solution of dense systems of
linear equations arising from integral equation formulations. Several preconditioners in
connection with Krylov iterative solvers are examined and compared with LU factorization.
Results are shown demonstrating practical aspects and issues we have encountered in
implementing iterative solvers on both parallel and sequential computers. |
|
|
| Jorge J. More' and Zhijun Wu, ``Global Smoothing and
Continuation for Large-Scale Molecular Optimization,'' Preprint MCS-P539-1095. |
We discuss the formulation of optimization problems that
arise in the study of distance geometry, ionic systems, and molecular clusters. We show
that continuation techniques based on global smoothing are applicable to these molecular
optimization problems, and we outline the issues that must be resolved in the solution of
large-scale molecular optimization problems. |
|
|
| L. Wos, "OTTER and the Moufang Identity Problem," Journal
of Automated Reasoning (to appear). Also Preprint MCS-P540-0895,
November 1995. |
This article provides additional evidence of the value of
using an automated reasoning program as a research assistant. Featured is the use of Bill
McCune's program OTTER to find proofs of theorems taken from the study of Moufang loops,
but not just any proofs. Specifically, the proofs satisfy the property of purity.
In particular, when given, say, four equivalent identities (which is the case in this
article), one is asked to prove the second identity from the first, the third from the
second, the fourth from the third, and the first from the fourth. If the proof that 1
implies 2 does not rely on 3 or 4, then by definition the proof is pure with respect to 3
and 4, or simply the proof is pure. If for the four identities one finds four
pure proofs showing that 1 implies 2, 2 implies 3, 3 implies 4, and 4 implies 1, then by
definition one has found a circle of pure proofs. By finding the needed twelve
pure proofs, this article shows that there does exist a circle of pure proofs for the four
equivalent identities for Moufang loops and for all orderings of the identities;
however, for much of this article, the emphasis is on the first three identities. In
addition--in part to promote the use of automated reasoning programs and to answer
questions concerning the choice of options--featured here is the methodology that was
employed and a discussion of some of the obstacles, some of which are subtle. The approach
relies on paramodulation (with generalizes equality substitution), on demodulation,
and--so crucial for attacking deep questions and hard problems--on various strategies,
most important of which are the hot list strategy, the set of support strategy, and
McCune's ratio strategy. To permit verification of the results presented here, extension
of them, and application of the methodology to other unrelated fields, a sample input file
and four proofs (relevant to a circle of pure proofs for the four identities) are
included. Research topics and challenges are offered at the close of this article. |
|
|
| Steven A. Gabriel and Jorge J. More', "Smoothing of
Mixed Complementarity Problems," Preprint MCS-P541-0995,
September 1995. |
We introduce a smoothing approach to the mixed
complementarity problem, and study the limiting behavior of a path defined by approximate
minimizers of a nonlinear least squares problem. Our main result guarantees that, under a
mild regularity condition, limit points of the iterates are solutions to the mixed
complementarity problem. The analysis is applicable to a wide variety of algorithms
suitable for large-scale mixed complementarity problems. |
|
|
| Jorge J. More' and Zhijun Wu, "Smoothing Techniques for
Macromolecular Global Optimization," Preprint MCS-P542-0995,
September 1995. |
We study global optimization problems that arise in
macromolecular modeling, and the solution of these problems via continuation and
smoothing. Our results unify and extend the theory associated with the use of the Gaussian
transform for smoothing. We show that the Gaussian transform can be viewed as a special
case for a generalized transform and that these generalized transforms share many of the
properties of the Gaussian transform. We also show that the smoothing behavior of the
generalized transform can be studied in terms of the Fourier transform and that these
results indicate that the Gaussian transform has superior smoothing properties. |
|
|
| I. T. Foster, J. L. Tilson, A. F. Wagner, R. Shepard, R. J.
Harrison, R. A. Kendall, R. J. Littlefield, "High Performance Computational
Chemistry: (I) Scalable Fock Matrix Construction Algorithms," J. of Computational
Chemistry 7, no. 1 (1996), pp. 109-123. Also Preprint MCS-P543-1095,
November 1995. |
Several parallel algorithms for Fock matrix construction are
described. The algorithms calculate only the unique integrals, distribute the Fock and
density matrices over the processors of a massively parallel computer, use blocking
techniques to construct the distributed data structures, and use clustering techniques on
each processor to maximize data reuse. Algorithms based on both square and row blocked
distributions of the Fock and density matrices are described and evaluated. Variants of
the algorithms are discussed that use either triple-sort or canonical ordering of
integrals, and dynamic or static task clustering schemes. The algorithms are shown to
adapt to screening, with communication volume scaling down with computation costs.
Modeling techniques are used to characterize algorithm performance. Given the
characteristics of existing massively parallel computers, all the algorithms are shown to
be highly efficient on problems of moderate size. The algorithms using the row blocked
data distribution are the most efficient. |
|
|
| R. J. Harrison, M. F. Guest, R. A. Kendall, D. E. Bernholdt,
A. T. Wong, M. Stave, J. L. Anchell, A. C. Hess, R. J. Littlefield, G. L. Fann, J.
Nieplocha, G. S. Thomas, D. Elwood, J. Tilson, R. L. Shepard, Al F. Wagner, I. T. Foster,
E. Lusk, R. Stevens, "High Performance Computational Chemistry: (II) A Scalable SCF
Program," Preprint MCS-P544-1095,
December 1995. |
We discuss issues in developing scalable parallel algorithms
and focus in particular on the distribution, as opposed to the replication, of key data
structures. Replication of large data structures limits the maximum calculation size by
imposing a low ratio of processors to memory. Only applications which distribute both data
and computation across processors are truly scalable. The use of shared data structures
that may be independently accessed by each process even in a distributed-memory
environment greatly simplifies development and provides a significant performance
enhancement. We describe tools we have developed to support this programming paradigm.
These tools are used to develop a highly efficient and scalable algorithm to perform
self-consistent field calculations on molecular systems. A simple and classical
strip-mining algorithm suffices to achieve an efficient and scalable Fock-Matrix
construction in which all matrices are fully distributed. By strip-mining over atoms we
also exploit all available sparsity and pave the way to adopting more sophisticated
methods for summation of the Coulomb and exchange interactions. |
|
|
| C. H. Bischof and X. Sun, "On Tridiagonalizing and
Diagonalizing Symmetric Matrices with Repeated Eigenvalues," SIAM J. Matrix Anal.
Appl. 17, no. 4 (October 1996), pp. 869-885. Also Preprint MCS-P545-1095,
December 1995. |
We describe a divide-and-conquer tridiagonalization approach
for matrices with repeated eigenvalues. Our algorithm hinges on the fact that, under
easily constructively verifiable conditions, a symmetric matrix with bandwidth b
and k distinct eigenvalues must be block diagonal with diagonal blocks of size at
most bk. A slight modification of the usual orthogonal band-reduction algorithm
allows us to reveal this structure, which then leads to potential parallelism in the form
of independent diagonal blocks. Compared with the usual Householder reduction algorithm,
the new approach exhibits improved data locality, significantly more scope for
parallelism, and the potential to reduce arithmetic complexity by close to 50% for
matrices that have only two numerically distinct eigenvalues. The actual improvement
depends to a large extend on the number of distinct eigenvalues and a good estimate
thereof. However, at worst the algorithms behave like a successive bandreduction approach
to tridiagonalization. Moreover, we provide a numerically reliable and effective algorithm
for computing the eigenvalue decomposition of a symmetric matrix with two numerically
distinct eigenvalues. Such matrices arise, for example, in invariant subspace
decomposition approaches to the symmetric eigenvalue problem. |
|
|
| Yu Kuo, Sohail Zafar, and Bijan Jabbari, "Traffic
Characterization and Modeling of Wavelet-based VBR Encoded Video," Preprint MCS-P546-1195. |
Wavelet-based video codecs provide a hierarchical structure
for the encoded data, which can cater to a wide variety of applications such as multimedia
systems. The characteristics of such an encoder and its output, however, have not been
well examined. In this paper, we investigate the output characteristics of a wavelet-based
video codec and develop a composite model to capture the traffic behavior of its output
video data. Wavelet decomposition transforms the input video in a hierarchical structure
with a number of subimages at different resolutions and scales. The top-level wavelet in
this structure contains most of the signal energy. We first describe the characteristics
of traffic generated by each subimage and the effect of dropping various subimages at the
encoder on the signal-to-noise ratio at the receiver. We then develop an N-state Karkov
model to describe the traffic behavior of the top wavelet. The behavior of the remaining
wavelets are then obtained through estimation, based on the correlations between these
subimages at the same level of resolution and those wavelets located at an immediate
higher level. In our paper, a three-state Karkov model is developed. The resulting traffic
behavior described by various statistical properties, such as moments and correlations,
etc., are then utilized to validify our model. |
|
|
| R. B. Lehoucq and J. A. Scott, "An Evaluation of
Software for Computing Eigenvalues of Sparse Nonsymmetric Matrices," Preprint MCS-P547-1195,
January 1996. |
The past few years have seen a significant increase in
research into numerical methods for computing selected eigenvalues of large sparse
nonsymmetric matrices. This research has begun to lead to the development of high-quality
mathematical software. The software includes codes that implement subspace iteration
methods, Arnoldi-based algorithms, and nonsymmetric Lanczos methods. The aim of the
current study is to evaluate this state-of-the-art software. In this study we consider
subspace iteration and Arnoldi codes. We look at the key features of the codes and their
ease of use. Then, using a wide range of test problems, we compare the performance of the
codes in terms of storage requirements, execution times, accuracy, and reliability. We
also consider their suitability for solving large-scale industrial problems. Base don our
findings, we suggest how improved software should be designed. |
|
|
| Robert Almgren, Andrea Bertozzi, and Michael P. Brenner,
"Stable and Unstable Singularities in the Unforced Hele-Shaw Cell," Phys.
Fluids 8, no. 6 (June 1996), pp. 1356-1370. Also Preprint MCS-P549-1295, January
1996. |
We study singularity formation in the lubrication model for
the unforced Hele-Shaw system, describing the breaking in two of a fluid droplet confined
between two narrowly spaced glass plates. By varying the initial data, we exhibit four
different scenarios: (1) the droplet breaks in finite time, with two pinch points moving
toward each other and merging at the singular time; (2) the droplet breaks in finite time,
with two asymmetric pinch points propagating away from each other; (3) the droplet beaks
in finite time, with a single symmetric pinch point; or (4) the droplet relaxes to a
stable equilibrium shape without a finite time breakup. Each of the three singular
scenarios has a self-similar structure with different scaling laws; the first scenario has
not been observed before in other Hele-Shaw studies. We demonstrate instabilities of the
second and third scenarios, in which the solution changes its behavior at a thickness that
can be arbitrarily small depending on the initial condition. These transitions can be
identified by examining the structure of the solution in the intermediate scaling region. |
|
|
| E. Selkov et al., "The Metabolic Pathway Collection from
EMP: The Enzymes and Metabolic Pathways Database," Nucleic Acids Res. 24
(1996). Also Preprint MCS-P550-1295,
December 1995. |
The Enzymes and Metabolic Pathways database (EMP) is an
encoding of the contents of over 10,000 original publication on the topics of enzymology
and metabolism. This large body of information has been transformed into a queryable
database. An extraction of over 1400 pictorial representations of metabolic pathways from
this collection is freely available on the World Wide Web. We believe that this collection
will play an important role in the interpretation of genetic sequence data, as well as
offering a meaningful framework for the integration of many other forms of biological
data. |
|
|
| G. Quintana-Orti', X. Sun, and C. H. Bischof, "A BLAS-3
Version of the QR Factorization with Column Pivoting," SIAM J. Sci. Comput. 19
(1998), pp. 1486-1494. Also Preprint MCS-P551-1295,
January 1996. |
The QR factorization with column pivoting (QRP), originally
suggested by Golub and Businger in 1965, is a popular approach to computing rank-revealing
factorizations. It was implemented in LINPACK with the Level 1 BLAS and in LAPACK with the
Level 2 BLAS. While the Level 2 BLAS version generally delivers superior performance, it
may result in worse performance for large matrix sizes due to cache effects. We introduce
a modification of the QRP algorithm that allows the use of Level 3 BLAS kernels while
maintaining the numerical behavior of the LINPACK and LAPACK implementations. Experimental
comparisons of this approach with the LINPACK and LAPACK implementations on IBM RS/6000,
SGI R8000, and DEC Alpha platforms show considerable performance improvements. |
|
|
| Ali Bouaricha and Robert B. Schnabel, "Tensor Methods
for Large, Sparse Nonlinear Least Squares Problems," Preprint MCS-P552-1295,
February 1996. |
This paper introduces tensor methods for solving large,
sparse nonlinear least squares problems where the Jacobian either is analytically
available or is computed by finite difference approximations. Tensor methods have been
shown to have very good computational performance for small to medium-sized, dense
nonlinear least squares problems. In this paper we consider the application of tensor
methods to large, sparse nonlinear least squares problems. This involves an entirely new
way of solving the tensor model that is efficient for sparse problems. A number of
interesting linear algebraic implementation issues are addressed. The test results of the
tensor method applied to a set of sparse nonlinear least squares problems compared with
those of the standard Gauss-Newton method reveal that the tensor method is significantly
more robust and efficient than the standard Gauss-Newton method. |
|
|
| L. Krusin-Elbaum, A. D. Marwick, R. Wheeler, C. Feidl, V. M.
Vinokur, G. K. Leaf, and M. Palumbo, "Enhanced Pinning with controlled Splay
Configurations of Columnar Defects; Rapid Vortex Motion at Large Angles," Physical
Review Letters (to appear). Preprint MCS-P554-1295. |
Orders-of-magnitude enhancements of persistent currents J are
reported in YB{sub 2}Cu{sub 3}O{sub 7-\delta} with columnar defects arranged in a variety
of splayed configurations. The largest J is obtained for a planar distribution P{sub
pl}(\Theta}, with a splay angle \Theta{sub opt} = +-5 deg. A comparison of P{sub
pl}(\Theta) and a Gaussian distribution P{sub G}{\Theta} suggests that pinning by the
latter is controlled by large-angle tails of the Gaussian, which appear to enhance
thermal creep rate. Numerical simulations confirm the existence of the regimes where
vortex motion is promoted rather than suppressed by splay. |
|
|
| Carl F. Ollivier-Gooch, "Quasi-ENO Schemes for
Unstructured Meshes Based on Unlimited Data-Dependent Least-Squares Reconstruction,"
Preprint MCS-P555-1295,
January 1996. |
A crucial step in obtaining high-order accurate steady-state
solutions to the Euler and Navier-Stokes equations is the high-order accurate
reconstruction of the solution from cell-averaged values. Only after this reconstruction
has been completed can the flux integral around a control volume be accurately assessed.
In this work, a new reconstruction scheme is presented that is conservative, uniformly
accurate with no overshoots, easy to implement on arbitrary meshes, has good convergence
properties, and is computationally efficient. The new scheme, called DD-L_2, uses a
data-dependent weighted least-squares reconstruction with a fixed stencil. The weights are
chosen to strongly emphasize smooth data in the reconstruction. Because DD-L_2 is designed
in the framework of k-exact reconstruction, existing techniques for implementing such
reconstructions on arbitrary meshes can be used. The new scheme satisfies a relaxed
version of the ENO criteria. Local accuracy of the reconstruction is optimal except in the
case on functions that are continuous but have discontinuous low-order derivatives. The
total variation of the reconstruction is bounded by the total variation of the function to
within O (\Delta x). The asymptotic behavior of the scheme in reconstructing smooth and
piecewise smooth functions is demonstrated. DD-L_2 produces uniformly high-order accurate
reconstructions, even in the presence of discontinuities. Two-dimensional flow solutions
obtained using DD-L_2 reconstruction are compared with solutions using limited
elast-squares reconstruction. The solutions are virtually identical. The absence of a
limiter reduces the CPU time required for DD-L_2 solutions by 15-20% as compared to
limited reconstruction, even though the DD-L_2 gradient computation is slightly more
expensive than ordinary least-squares reconstruction. |