|
|
|
|
|
|
|
|
|
| A. S. Bondarenko, D. M. Bortz, and J. J. More',
"COPS: Large-Scale Nonlinearly Constrained Optimization Problems," Technical
Memorandum ANL/MCS-TM-237,
Sept. 1998, revised Oct. 1999. |
We have started the development of COPS, a collection of
large-scale nonlinearly Constrained Optimization ProblemS. The primary purpose of
this collection is to provide difficult test cases for optimization software.
Problems in the current version of the collection come from fluid dynamics, population
dynamics, optimal design, and optimal control. For each problem we provide a short
description of the problem, notes on the formulation of the problem, and results of
computational experiments with general optimization solvers. We currently have
results for DONLP2, LANCELOT, MINOS, SNOPT, and LOQO. |
|
|
| L. Freitag, "Users Manual for Opt-MS: Local
Methods for Simplicial Mesh Smoothing and Untangling," Technical Memorandum ANL/MCS-TM-239,
April 1999. |
Creating meshes containing good-quality elements is a
challenging, yet critical, problem facing computational scientists today. Several
researches have shown that the size of the mesh, the shape of the elements within that
mesh, and their relationship to the physical application of interest can profoundly affect
the efficiency and accuracy of many numerical approximation techniques. If the
application contains anisotropic physics, the mesh can be improved by considering both
local characteristics of the approximate application solution and the geometry of the
computational domain. If the application is isotropic, regularly shaped elements in
the mesh reduce the discretization error, and the mesh can be improved a priori
by considering geometric criteria only. The Opt-MS package provides several local
node point smoothing techniques that improve elements in the mesh by adjusting grid point
location using geometric criteria. The package is easy to use; only three subroutine
calls are required for the user to begin using the software. The package is also
flexible; the user may change the technique, function, or dimension of the problem at any
time during the mesh smoothing process. Opt-MS is designed to interface with C and
C++ codes, and examples for both two- and three-dimensional meshes are provided. |
|
|
|
|
|
|
|
| H. G. Kaper, S. Tipei, and E. Wiebel, "Data
Sonification and Sound Visualization," Computing in Science and Engineering 1(4)
(July/Aug 1999), pp. 48-58. Also Preprint ANL/MCS-P738-0199,
Jan. 1999 |
This article describes a collaborative project
between researchers in the Mathematics and Computer Science Division at Argonne National
Laboratory and the Computer Music Project of the University of Illinois at
Urbana-Champaign. The project focuses on the use of sound for the exploration and
analysis of complex data sets in scientific computing. The article addresses digital
sound synthesis in the context of DIASS, a Digital Instrument for Additive Sound
Synthesis, and sound visualization in a virtual-reality environment by means of M4CAVE.
It describes the procedures and preliminary results of some experiments in
scientific sonification and sound visualization. |
|
|
| W. Benger, I. Foster, J. Novotny, E. Seidel, J.
Shalf, W. Smith, P. Walker, "Numerical Relativity in a Distributed Environment,"
Preprint ANL/MCS-P739-0199,
Jan. 1999 |
The Cactus parallel simulation framework
provides a modular and extensible set of components for solving relativity problems on
parallel computes. In recent work, we have investigated techniques that would enable
the execution of Cactus applications in wide area "computational grid"
environments. In a first study, we investigated the feasibility of distributing a
single simulation across multiple supercomputers, while in a second we studied techniques
for reducing communication costs associated with remote visualization and steering.
Distributed simulation was achieved by using MPICH-G, an implementation of the Message
Passing Interface standard that uses mechanisms provided by the Globus grid toolkit to
enable wide area execution. Experiments were performed across SGI Origins and Cray
T3Es with geographical separations ranging from hundreds to thousands of kilometers.
Total execution time when distributed increased by between 48% and 133%, depending on
configuration. We view these results as encouraging as they were obtained with
essentially no specialized algorithmic structures in the Cactus application. Work
on remote visualization focused on the development of a Cactus module that computes
isosurfaces inline with numerical relativity calculations. Experiments demonstrated
that this technique can reduce network bandwidth requirements by a factor ranging from 2.5
to 114, depending on the nature of the problem. |
|
|
| Z. Ren, R. Sheng, and S. J. Wright,
"Advanced Computational Techniques for Laue Diffraction," Preprint ANL/MCS-P740-0199,
Feb. 1999. |
We describe LaueView, a code for processing the
measured intensity data in Laue X-ray diffraction experiments to obtain corrected
structure amplitudes for each reflection that take account of the various distortion
effects. The resulting corrected intensity data can then be used to recover the
molecular structure by isomorphous refinement or by solution of the phase problem.
We describe the key numerical techniques used in LaueView and outline the improvements we
made to obtain a new, more efficient, and parallel version of the code. We conclude
with some computational results obtained on a real data set that illustrate our
improvements. The basic principles of the Laue method are described in an appendix,
where we outline the distortions in the measured intensity data due to effects such as
blurring, overlap of the spots, the nonuniform distribution of intensities in the incident
X-ray beam, and absorption effects of various types. |
|
|
| R. Thakur, W. Gropp, and E. Lusk,
"Optimizing Noncontiguous Accesses in MPI-IO," Parallel
Computing 28 (2002), pp. 83-105. Also Preprint ANL/MCS-P742-0299,
Feb. 1999, revised Dec. 2000. |
The I/O access patterns of many parallel
applications consist of accesses to a large number of small, noncontiguous pieces of data.
If an application's I/O needs are met by making many small, distinct I/O requests,
however, the I/O performance degrades drastically. to avoid this problem, MPI-IO
allows users to access noncontiguous data with a single I/O function call, unlike in Unix
I/O. In this paper, we explain how critical this feature of MPI-IO is for high
performance and how it enables implementations to perform optimizations. An
application can be written in many different ways with MPI-IO. We classify the
different ways of expressing an application's I/O access pattern in MPI-IO into four
levels, called level 0 through level 3. We demonstrate that, for applications with
noncontiguous access patterns, the I/O performance improves significantly if users write
the application such that it makes level-3 MPI-IO requests (noncontiguous, collective)
rather than level-0 requests (Unix style). We describe how our MPI-IO
implementation, ROMIO, delivers high performance for noncontiguous requests. We
explain in detail the two key optimizations ROMIO performs: data sieving for noncontiguous
requests from one process and collective I/O for noncontiguous requests from multiple
processes. We describe how one can implement these optimizations portably on
multiple machines and file systems, control their memory requirements, and also achieve
high performance. We demonstrate the performance and portability with performance
results for three applications---an astrophysics-application template (DIST3D), the NAS
BTIO benchmark, and an unstructured code (UNSTRUC)---on five different parallel machines:
HP Exemplar, IBM SP, Intel Paragon, NEC SX-4, and SGI Origin2000. |
|
|
| L. Freitag and T. Urness, "Analyzing
Industrial Furnace Efficiency Using Comparative Visualization in a Virtual Reality
Environment," Preprint ANL/MCS-P744-0299,
Feb. 1999. |
We describe an interactive toolkit used to
perform comparative analysis of two or more data sets arising from numerical simulations.
Several techniques have been incorporated into this toolkit, including (1)
successive visualization of individual data sets, (2) data comparison techniques such as
computation and visualization of the differences between data sets, and (3) image
comparison methods such as scalar field height profiles plotted in a common coordinate
system. We describe each technique in detail and show example usage in an industrial
application aimed at designing an efficient, low-NOx burner for industrial furnaces.
Critical insights are obtained by interactively adjusted color maps, data culling,
and data manipulation. New paradigms for scaling small values in the data comparison
technique are described. The display device used for this application was the CAVE
virtual reality theater, and we describe the user interface to the visualization toolkit
and the benefits of immersive 3D visualization for comparative analysis. |
|
|
| R. Overbeek, M. Fonstein, M. D'Souza, G. D.
Pusch, and N. Maltsev, "Use of Contiguity on the Chromosome to Predict Functional
Coupling," In Silico Biology 1 (1999), pp. 93-108.
Also Preprint ANL/MCS-P746-0299,
Feb. 1999. |
The availability of a growing number of
completely sequenced genomes opens new opportunities for understanding of complex
biological systems. Success of genome-based biology will, to a large extent, depend
on the development of new approaches and tools for efficient comparative analysis of the
genomes and their organization. We have developed a technique for detecting possible
functional coupling between genes based on detection of potential operons. The
approach involves computation of "pairs of close bi-directional best hits,"
which are pairs of genes that apparently occur within operons in multiple genomes.
Using these pairs, one can compose evidence (based on the number of distinct genomes and
the phylogenetic distance between the orthologous pairs) that a pair of genes is
potentially functionally coupled. The technique has revealed a surprisingly rich and
apparently accurate set of functionally coupled genes. The approach depends on the
use of a relatively large number of genomes, and the amount of detected coupling grows
dramatically as the number of genomes increases. |
|
|
| H. G. Kaper and S. Tipei, "An Abstract
Approach to Music," Preprint ANL/MCS-P748-0399,
March 1999. |
The notion of formalized music implies that a
musical composition can be described in mathematical terms. In this article we
explore some formal aspects of music and propose a framework for an abstract approach. |
|
|
| L. A. Freitag and P. Plassmann, "Local
Optimization-Based Simplicial Mesh Untangling and Improvement," Preprint ANL/MCS-P749-0399,
May 1999. |
We develop an optimization-based approach for
mesh untangling formulated by maximizing the minimum area or volume of the simplicial
elements in a local submesh. The method that the function level sets are convex
regardless of the position of the free vertex, and hence the local subproblem is
guaranteed to converge. Maximizing the minimum area of mesh elements, although
well-suited for mesh untangling, is not ideal for mesh improvement and its use often
results in poor quality meshes. We therefore combine the mesh untangling technique
with optimization-based mesh improvement techniques, and expand previous results to show
that the level sets for a commonly used mesh quality criterion is convex in the feasible
region. Typical results showing the effectiveness of the combined untangling and
smoothing techniques are given for both two- and three-dimensional meshes. |
|
|
| J. Bresnahan, I. Foster, J. Insley, B. Toonen,
S. Tuecke, "Communication Services for Advanced Network Applications," Preprint ANL/MCS-P750-0599,
May 1999. |
Advanced network applications such as remote
instrument control, collaborative environments, and remote I/O are distinguished from
"traditional" applications such as videoconferencing by their need to create
multiple, heterogeneous flows with different characteristics. For example, a single
application may require remote I/O for raw datasets, shared controls for a collaborative
analysis system, streaming video for image rendering data, and audio for collaboration.
Furthermore, each flow can have different requirements in terms of reliability,
network quality of service, security, etc. We argue that new approaches to
communication services, protocols, and network architecture are required both to provide
high-level abstractions for common flow types and to support user-level management of flow
creation and quality. We describe experiences with the development of such
applications and communication services. |
|
|
| L. A. Freitag, W. D. Gropp, P. D. Hovland, L. C.
McInnes, and B. F. Smith, "Infrastructure and Interfaces for Large-Scale Numerical
Software," Preprint ANL/MCS-P751-0599,
May 1999. |
The complexity of large-scale scientific
simulations often necessitates the combined use of multiple software packages developed by
different groups in areas such as adaptive mesh manipulations, scalable algebraic solvers,
and optimization. Historically, these packages have been combined by using custom
code. This practice inhibits experimentation with and comparison of multiple tools
that provide similar functionality through different implementations. The ALICE project, a
collaborative effort among researchers at Argonne National Laboratory, is exploring the
use of component-based software engineering to provide better interoperability among
numerical toolkits. We discuss some initial experiences in developing an infrastructure
and interfaces for high-performance numerical computing. |
|
|
| L. A. Freitag and R. M. Loy, "Adaptive,
Multiresolution Visualization of Large Data Sets Using Parallel Octrees," Preprint ANL/MCS-P752-0599,
May 1999. |
The interactive visualization and exploration of
large scientific data sets is a challenging and difficult task; their size often far
exceeds the performance and memory capacity of even the most powerful graphics
workstations. To address this problem, we have created a technique that combines
hierarchical data reduction methods with parallel computing to allow interactive
exploration of large data sets while retaining full-resolution capability. The
hierarchical representation is built in parallel by strategically inserting field data
into an octree data structure. We provide functionality that allows the user to
interactively adapt the resolution of the reduced data sets so that resolution is
increased in regions of interest without sacrificing local graphics performance. We
describe the creation of the reduced data sets using a parallel octree, the software
architecture of the system, and the performance of this system on the data from a
Rayleigh-Taylor instability simulation. |
|
|
| J. Cownie and W. Gropp, "A Standard
Interface for Debugger Access to Message Queue Information in MPI," Preprint ANL/MCS-P754-0699,
June 1999. |
This paper discusses the design and
implementation of an interface that allows a debugger to obtain the information necessary
to display the contents of the MPI message queues. The design has been implemented
in the TotalView debugger, and dynamic libraries that conform to the interface exist for
MPICH, as well as the proprietary MPI implementations from Compaq, IBM, and SGI. |
|
|
| W. Gropp and E. Lusk, "Reproducible
Measurements of MPI Performance Characteristics, Preprint ANL/MCS-P755-0699,
June 1999. |
In this paper we describe the difficulties
inherent in making accurate, reproducible measurements of message-passing performance.
We describe some of the mistakes often made in attempting such measurements and the
consequences of such mistakes. We describe mpptest, a suite of
performance measurement programs developed at Argonne National Laboratory, that attempts
to avoid such mistakes and obtain reproducible measures of MPI performance that can be
useful to both MPI implementors and MPI application writers. We include a number of
illustrative examples of its use. |
|
|
| M. Gomberg, C. Stacey, and J. Sayre,
"Scalable, Remote Administration of Windows NT," Preprint ANL/MCS-P756-0699,
June 1999. |
In the UNIX community there is an overwhelming
perception that NT is impossible to manage remotely and that NT administration doesn't
scale. This was essentially true with earlier versions of the operating system.
Even today, out of the box, NT is difficult to manage remotely. Many tools,
however, now make remote management of NT not only possible, but under some circumstances
very easy. In this paper we discuss how we at Argonne's Mathematics and Computer
Science Division manage all our NT machines remotely from a single console, with minimum
locally installed software overhead. We also present NetReg, which is a locally
developed tool for scalable registry management. NetReg allows us to apply a
registry change to a specified set of machines. It is a command line utility that
can be run in either interactive or batch mode and is written in Perl for Win32, taking
heavy advantage of the Win32::TieRegistry module. |
|
|
| G. von Laszewski and I. Foster, "Grid
Infrastructure to Support Science Portals for Large Scale Instruments," Preprint ANL/MCS-P757-0699,
June 1999. |
Soon, a new generation of scientific workbenches
will be developed as a collaborative effort among various research institutions in the
United States. These scientific workbenches will be accesses in the Web via portals.
Reusable components are needed to build such portals for different scientific
disciplines, allowing uniform desktop access to remove resources. Such components
will include tools and services enabling easy collaboration, job submission, job
monitoring, component discovery, and persistent object storage. Based on experience
gained from Grand Challenge applications for large-scale instruments, we demonstrate how
Grid infrastructure components can be used to support the implementation of science
portals. The availability of these components will simplify the prototype
implementation of a common portal architecture. |
|
|
| S. Bittner, "Selecting and Implementing the
PBS Scheduler on an SGI Onyx 2/Origin 2000," Preprint ANL/MCS-P758-0699,
June 1999. |
In the Mathematics and Computer Science Division
at Argonne National Laboratory, the demand for resources on the Onyx 2 exceeds the
resources available for consumption. To distribute these scarce resources
effectively, we need a scheduling and resource management package with multiple
capabilities. In particular, it must accept standard interactive user logins, allow
batch jobs, backfill the system based on available resources, and permit system activities
such as accounting to proceed without interruption. The package must include a
mechanism to treat the graphic pipes as a schedulable resource. Also required is the
ability to create advance reservations, offer dedicated system modes for large resource
runs and benchmarking, and track the resources consumed for each job run.
Furthermore, our users want to be able to obtain repeatable timing results on job
runs. And, of course, package costs must be carefully considered. We explored
several options, including NQE and various third-party products, before settling on the
PBS scheduler. |
|
|
| R. Armstrong, D. Gannon, A. Geist, K. Keahey, S.
Kohn, L. McInnes, S. Parker, and B. Smolinski, "Toward a Common Component
Architecture for High-Performance Scientific Computing," Preprint ANL/MCS-P759-0699,
June 1999. |
This paper describes work in progress to develop
a standard for interoperability among high-performance scientific components. This
research stems from growing recognition that the scientific community needs to better
manage the complexity of multidisciplinary simulations and better address scalable
performance issues on parallel and distributed architectures. Driving forces are the
need for fast connections among components that perform numerically intensive work and for
parallel collective interactions among components that use multiple processes or threads.
This paper focuses on the areas we believe are most crucial in this context,
namely, an interface definition language that supports scientific abstractions for
specifying component interfaces and a ports connections model for specifying component
interactions. |
|
|
| M. Anitescu, "On the Rate of Convergence of
Sequential Quadratic Programming with Nondifferentiable Exact Penalty Function in the
Presence of Constraint Degeneracy," Math. Program. Ser. A 92
(2002), pp. 359-386. Also Preprint ANL/MCS-P760-0699,
June 1999. |
We analyze the convergence of the sequential
quadratic programming (SQP) method for nonlinear programming for the case in which the
Jacobian of the active constraints is rank deficient at the solution and/or strict
complementarity does not hold for some or any feasible LaGrange multipliers. We use
a nondifferentiable exact penalty function, and we prove that the sequence generated by
the SQP is locally R-linearly convergent if the matrices of the quadratic program are
uniformly positive definite and bounded, provided that the Mangasarian-Fromowitz
constraint qualification and some second-order sufficiency conditions hold. |
|
|
| M. Anitescu, "Degenerate Nonlinear
Programming with a Quadratic Growth Condition," SIAM J. Optim. 10(4)
(2000), pp. 116-1135. Also Preprint ANL/MCS-P761-0699,
June 1999. |
We show that the quadratic growth condition and
the Mangasarian-Fromowitz constraint qualification imply that local minima of nonlinear
programs are isolated stationary points. As a result, when started sufficiently close to
such points, an L_\infty exact penalty sequential quadratic programming algorithm will
induce at least R-linear convergence of the iterates to such a local minimum. We
construct an example of a degenerate nonlinear program with a unique local minimum
satisfying the quadratic growth and the Mangasarian-Fromowitz constraint qualification but
for which no positive semidefinite augmented Lagrangian exists. We present numerical
results obtained using several nonlinear programming packages on this example, and discuss
its implications for some algorithms. |
|
|
| H. M. Tufo and P. F. Fischer, "Terascale
Spectral Element Algorithms and Implementations," Preprint ANL/MCS-P762-0699,
June 1999. |
We describe the development and implementation
of an efficient spectral element code for multimillion gridpoint simulations of
incompressible flows in general two- and three-dimensional domains. We review basic
and recently developed algorithmic underpinnings that have resulted in good parallel and
vector performance on a broad range of architectures, including the terascale computing
systems now coming on line at the DOE labs. Sustained performance of 219 GFLOPS has
been recently achieved on 2048 nodes of the Intel ASCI-Red machine at Sandia. |
|
|
| O. Zaki, E. Lusk, W. Gropp, and D. Swider,
"Toward Scalable Performance Visualization with Jumpshot," Int'l J. of
High-Performance Computing Applications (to appear). Also Preprint ANL/MCS-P763-0699,
June 1999. |
Jumpshot is a graphical tool for understanding
the performance of parallel programs. It is in the tradition of the upshot tool, but
contains a number of extensions and enhancements that make it suitable for large-scale
parallel computations. Jumpshot takes as input a new, more flexible logfile format,
and comes with a library for generating such logfiles. An MPI profiling library is
also included, enabling the automatic generation of such logfiles from MPI programs.
Jumpshot is written in Java, and can easily be integrated as an applet into browser-based
computing environments. The most novel feature of Jumpshot is its automatic
detection of anomalous durations, drawing the user's attention to problem areas in a
parallel execution. This capability is particularly useful in large-scale parallel
computations containing very many events. |
|
|
| G. W. Crabtree, D. O. Gunter, H. G. Kaper, A. E.
Koshelev, G. K. Leaf, and V. M. Vinokur, "Numerical Simulations of Driven Vortex
Systems," Physical Review B 61, no. 2 (Jan. 2000), pp.
1446-1455. Also Preprint ANL/MCS-P764-0699,
June 1999. |
This article reports on several large-scale
numerical simulations of vortex systems that are driven through superconducting media with
defects. The simulations are based on the time-dependent Ginzburg-Landau equations.
The simulations demonstrate regimes of plastic and elastic steady-state motion in
the presence of a twin boundary, show the effect of regular and irregular arrays of point
defects on vortex trajectories, and show a mechanism by which vortices move through an
array of columnar defects. Also presented are the results of some transient
simulations in two and three dimensions, which show that, in the transition from the
Meissner state to the vortex state, vortices are formed by a process of deposition. |
|
|
| D. O. Gunter, H. G. Kaper, and G. K. Leaf,
"Implicit Integration of the Time-Dependent Ginzburg-Landau Equations of
Superconductivity," SIAM J. Sci. Comput. 23. no. 9
(2002), pp. 1944_1959. Also Preprint ANL/MCS-P765-0699,
June 1999, revised July 2000. |
This article is concerned with the integration of
the time-dependent Ginzburg-Landau (TDGL) equations of superconductivity. Four
algorithms, ranging from fully explicit to fully implicit, are presented and evaluated for
stability, accuracy, and compute time. The benchmark problem for he evaluation is
the equilibration of a vortex configuration in a superconductor that is embedded in a thin
insulator and subject to an applied magnetic field. |
|
|
| M. K. Mihçak, P. Moulin, M. Anitescu, and K.
Ramchandran, "Rate-Distortion-Optimal Subband Coding without Perfect-Reconstruction
Constraints," Preprint ANL/MCS-P766-0699,
June 1999. |
We investigate the design of subband coders
without the traditional perfect-reconstruction constraint on the filters. The coder
uses scalar quantizers, and its filters and bit allocation are designed so as to optimize
a rate-distortion criterion. Convexity properties play a central role in the
analysis. Our results hold for a broad class of rate-distortion criteria.
First, we show that optimality can be achieved using filter banks that are the cascade of
a (paraunitary) principal component filter bank for the input spectral process and a set
of pre- and post-filters surrounding each quantizer. An algorithm for computing the
globally optimal filters and bit allocation is given. We then develop closed-form
solutions for the special case of two-channel coders under an exponential rate-distortion
model. Finally, we investigate a constrained-length version of the filter design
problem, which is applicable to practical coding scenarios. While the optimal filter
banks are nearly perfect-reconstruction at high rates, we demonstrate an apparently
surprising advantage of optimal FIR filter banks: they significantly outperform
optimal perfect-reconstruction FIR filter banks at all bit rates. |
|
|
| J. G. Michalakes, R. J. Purser, and R. Swinbank,
"Data Structure and Parallel Decomposition Considerations on a Fibonacci Grid," Proc.
Numer. Weather Prediction Conf., Denver, 9/13-17, 1999 (to appear). Also
Preprint ANL/MCS-P767-0699. |
The Fibonacci grid, proposed by Swinbank and
Purser (1999), provides attractive properties for global numerical atmospheric prediction
by offering an optimally homogeneous, geometrically regular, and approximately isotropic
discretization, with (as always) only the polar regions requiring special numerical
treatment. It is a mathematical idealization, applied to the sphere, of the
multi-spiral patterns often found in botanical structures, such as in pine cones and
sunflower heads. Computationally, it is natural to organize the domain into zones,
in each of which the same pair (or possibly, triplet) of Fibonacci spirals dominate. But
the further subdivision of such zones into tiles of a shape and size suitable for
distribution to the processors of a massively parallel computer requires careful
consideration if the subsequent spatial computations along the respective spirals,
especially those computations (such as compact differencing schemes) that involve
recursion, can be implemented in an efficient, load-balanced manner without requiring
excessive amounts of inter-processor communications. Even when traditional low-order
explicit methods are used, care must still be taken. We show how certain simple
properties of the Fibonacci sequence (whose numbers prescribe the multiplicities of the
members of the successive families of spirals that make up the grid) may be exploited in
the subdivision of grid zones into arrangements of triangular grid tiles, each with two
sides coincident with grid lines and a ragged "hypotenuse" approximately aligned
east-west. |
|
|
| S. J. Benson, L. C. McInnes, and J. J. Moré,
"GPCG: A Case Study in the Performance and Scalability of Optimization
Algorithms," Preprint ANL/MCS-P768-0799,
Sept. 2000. |
GPCG is an algorithm within the Toolkit for
Advanced Optimization (TAO) for solving bound constrained, convex quadratic problems.
Originally developed by Moré and Toraldo, this algorithm was designed for
large-scale problems but had been implemented only for a single processor. The TAO
implementation is available for a wide range of high-performance architecture, and has
been tested on up to 64 processors to solve problems with over 2.5 million variables. |
| L. A. Freitag and P. M. Knupp, "Tetrahedral
Element Shape Optimization via the Jacobian Determinant and Condition Number,"
Preprint ANL/MCS-P769-0799,
July 1999. |
We present a new shape measure for tetrahedral
elements that is optimal in the sense that it gives the distance of a tetrahedron from the
set of inverted elements. This measure is constructed from the condition number of
the linear transformation between a unit equilateral tetrahedron and any tetrahedron with
positive volume. We use this shape measure to formulate two optimization objective
functions, which are differentiated by their goal: the first seeks to improve the average
quality of the tetrahedral mesh, the second aims to improve the worst quality element in
the mesh. Because the element condition number is not defined for tetrahedra with
negative volume, these objective functions can be used only when the initial mesh is
valid. Therefore, we formulate a third objective function using the determinant of
the element Jacobian that is suitable for mesh untangling. We review the
optimization techniques used with each objective function, and present experimental
results that demonstrate the effectiveness of the mesh improvement and untangling methods.
We show that a combined optimization approach that uses both condition number
objective functions obtains the best quality meshes. |
|
|
| H. G. Kaper and S. Tipei, "Formalizing the
Concept of Sound," Preprint ANL/MCS-P771-0799,
July 1999. |
The motion of formalized music implies that a
musical composition can be described in mathematical terms. In this article we
explore some formal aspects of music and propose a framework for an abstract approach. |
|
|
| S. J. Wright and D. Orban, "Properties of
the Log-Barrier Function on Degenerate Nonlinear Programs," Preprint ANL/MCS-P772-0799,
July 1999. |
We examine the sequence of local minimizers of
the log-barrier function for a nonlinear program near a solution at which second-order
sufficient conditions and the Mangasarian-Fromovitz constraint qualifications are
satisfied, but the active constraint gradients are not necessarily linearly independent.
When a strict complementarity condition is satisfied, we show uniqueness of the
local minimizer of the barrier function in the vicinity of the nonlinear program solution,
and obtain a semi-explicit characterization of this point. When strict
complementarity does not hold, we obtain several other interesting characterizations, in
particular, an estimate of the distance between the minimizers of the barrier function and
the nonlinear program in terms of the barrier parameter, and a result about the direction
of approach of the sequence of minimizers of the barrier function to the nonlinear
programming solution. |
|
|
| J. E. Flaherty, R. M. Loy, M. S. Shephard, and
J. D. Teresco, "Software for the Parallel Adaptive Solution of Conservation Laws by
Discontinuous Galerkin Methods," Preprint ANL/MCS-P773-0799,
July 1999. |
We develop software tools for the solution of
conservation laws using parallel adaptive discontinuous Galerkin methods. In
particular, the Rensselaer Partition Model (RPM) provides parallel mesh structures within
an adaptive framework to solve the Euler equations of compressible flow by a discontinuous
Galerkin method (LOCO). Results are presented for a Rayleigh-Taylor flow instability
for computations performed on 128 processors of an IBM SP computer. In addition to
managing the distributed data and maintaining a load balance, RPM provides information
about the parallel environment that can be used to tailor partition to a specific
computational environment. |
|
|
| H. M. Tufo, P F. Fischer, M. E. Papka, and M.
Szymanski, "Hairpin Vortex Formation, a Case Study for Unsteady Visualization,"
Preprint ANL/MCS-P774-0799,
July 1999. |
To better understand the vortex dynamics of
coherent structures in turbulent and transitional boundary layers, we consider direct
numerical simulation of the interaction between a flat-plate-boundary-layer flow and an
isolated hemispherical roughness element. Of principal interest is the evolution of
hairpin vortices that form an interlacing pattern in the wake of the hemisphere, lift away
from the wall, and are stretched by the shearing action of the boundary layer. Using
animations of unsteady three-dimensional representations of this flow, produced by the vtk
toolkit and enhanced to operate in a CAVE virtual environment, we identify and study
several key features in the evolution of this complex vortex topology not previously
observed in other visualization formats. |
|
|
| W. McCune and O. Shumsky, "Ivy: A
Preprocessor and Proof Checker for First-Order Logic," in Computer-Aided
Reasoning: ACL2 Case Studies, M. Kaufmann, P. Manolios, J Moore, eds., Kluwer
Academic Publishers (2000). Also Preprint ANL/MCS-P775-0899,
Aug. 1999. |
This case study shows how non-ACL2 programs can
be combined with ACL2 functions in such a way that useful properties can be proved about
the composite programs. Nothing is proved about the non-ACL2 programs.
Instead, the results of the non-ACL2 programs are checked at run time by ACL2 functions,
and properties of these checker functions are proved. The application is
resolution/paramodulation automated theorem proving for first-order logic. The top
ACL2 function takes a conjecture, preprocesses the conjecture, and calls a non-ACL2
program to search for a proof or countermodel. If the non-ACL2 program succeeds,
ACL2 functions check the proof or countermodel. The top ACL2 function is proved
sound with respect to finite interpretations. |
|
|
| W. K. Anderson, W. D. Gropp, D. K. Kaushik, D.
E. Keyes, and B. F. Smith, "Achieving High Sustained Performance in an Unstructured
Mesh CFD Application," Preprint ANL/MCS-P776-0899,
Aug. 1999. |
Overview: Many applications of economic and
national security importance require the solution of nonlinear partial differential
equations (PDEs). In many cases, PDEs possess a wide range of time scales---some
(e.g., acoustic) faster than the phenomena of prime interest (e.g., convective),
suggesting the need for implicit methods. In addition, many applications are
geometrically complex and possess a wide range of length scales, requiring an unstructured
mesh to adequately resolve the problem without requiring an excessive number of mesh
points and to accomplish mesh generation and adaptation (almost) automatically. The
best algorithms for solving nonlinear implicit problems are often Newton Methods, which
themselves require the solution of very large, sparse linear systems. The best
algorithms for these sparse linear problems, particularly at very large sizes, are often
preconditioned iterative methods. This nested hierarchy of tunable algorithms has
proved effective in solving complex problems in areas such as aerodynamics, combustion,
radiation transport, and global circulation. Typically, for steady-state solutions
from a trivial initial guess, the number of "work units" (evaluations of the
discrete residuals on the finest mesh on which the problem is represented) is around 10**3
(to achieve reductions in the norm of the residual of 10**-8 to 10**-12). Although
these algorithms are efficient (in the sense of using relatively few floating-point
operations to arrive at the final result), they do not necessarily achieve the absolute
flops-per-second (flop/s) ratings that less efficient or less versatile algorithms may. |
|
|
| P. F. Fischer and J. S. Mullen,
"Filter-based Stabilization of Spectral Element Methods," C. R.
Acad. Sci. Paris 332(1) (2001) 265-270. Also Preprint ANL/MCS-P777-0899,
Aug. 1999. |
We present a simple filtering procedure for
stabilizing the spectral element method (SEM) for the unsteady advection-diffusion and
Navier-Stokes equations. A number of example applications are presented, along with
basic analysis for the advection-diffusion case. |
|
|
| P. F. Fischer and H. M. Tufo,
"High-Performance Spectral Element Algorithms and Implementations," Preprint ANL/MCS-P778-0899,
Aug. 1999. |
We describe the development and implementation
of a spectral element code for multimillion gridpoint simulations of incompressible flows
in general two- and three-dimensional domains. Parallel performance is present on up
to 2048 nodes of the Intel ASCI-Red machine at Sandia. |
|
|
| H M. Tufo, P. F. Fischer, M. E. Papka, and
K. Blom, "Numerical Simulation and Immersive Visualization of Hairpin Vortices,"
Preprint ANL/MCS-P779-0899,
Aug. 1999. |
To better understand the vortex dynamics of
coherent structures in turbulent and transitional boundary layers, we consider direct
numerical simulation of the interaction between a flat-plate-boundary-layer flow and an
isolated hemispherical roughness element. Of principal interest is the evolution of
hairpin vortices that form an interlacing pattern in the wake of the hemisphere, lift away
from the wall, and are stretched by the shearing action of the boundary layer. Using
animations of unsteady three-dimensional representations of this flow, produced by the vtk
toolkit and enhanced to operate in a CAVE virtual environment, we identify and study
several key features in the evolution of this complex vortex topology not previously
observed in other visualization formats. |
|
|
| J. L. Tilson, R. Shepard, C. Naleway, A. F.
Wagner, and W. C. Ermler, "Ab Initio Determination of Americium Ionization
Potentials," Preprint
ANL/MCS-P780-0899,
Aug. 1999. |
The first three ionization potentials of
americium are calculated using ab initio spin-orbit configuration interaction techniques.
These results are favorably compared to available experimental and previous
theoretical work. The lowest two ionization potentials are accurately determined
using wave functions constructed as simple single and double substitutions from a
self-consistent field reference configuration with scalar relativistic effects included
through an averaged relativistic pseudopotential. A determination of the third
ionization potential to comparable accuracy requires inclusion of the spin-orbit operator
and significant intermediate coupling with a resulting configuration expansion length in
excess of 1.9 million double-group adapted functions. The solution to this problem
was achieved by application of a new parallel spin-orbit configuration interaction
component to the COLUMBUS Program System. A decomposition of the ionization
potential calculation into parts either sensitive or largely insensitive to the spin-orbit
operator is favorably tested, leading to hybrid calculations of improved accuracy. |
|
|
| M. D'Souza, M. F. Romine, and N. Maltsev,
"SENTRA, a Database of Signal Transduction Proteins," Nucleic Acids Research
(to appear Jan. 2000). Also Preprint ANL/MCS-P781-0999,
Sept. 1999. |
SENTRA is a a database of proteins associated
with microbial signal transduction. The database currently includes the classical
two-component signal transduction pathway proteins and methyl-accepting chemotaxis
proteins, but will be expanded to also include other classes of signal transduction
systems that are modulated by phosphorylation or methylation reactions. Although the
majority of database entries are from prokaryotic systems, eukaroytic proteins with
bacterial-like signal transduction domains are also included. Currently SENTRA
contains signal transduction proteins in 34 complete and almost completely sequenced
prokaryotic genomes, as well as sequences from 243 organisms available in public databases
(SwissProt and EMBL). The analysis was carried out within the framework of the WIT2
system, which is designed and implemented to support genetic sequence analysis and
comparative analysis of sequenced genomes. |
|
|
| J. L. Tilson, W. C. Ermler, and R. M. Pitzer,
"Parallel Spin-Orbit Coupled Configuration Interaction," Computer Physics
Communications 128 (2000), pp. 128-138. Also Preprint ANL/MCS-P782-0999,
Sept. 1999. |
A parallel spin-orbit configuration interaction
(SOCI) code has been developed. This code, named P-SOCI, is an extension of an
existing sequential SOCI program and permits solutions to heavy-element systems requiring
both explicit spin-orbit (SO) effects and significant electron correlation. The
relativistic procedure adopted here is an ab initio conventional configuration interaction
(CI) method that constructs a Hamiltonian matrix in a double-group-adapted basis.
P-SOCI enables solutions to problems far larger than possible with the original code by
exploiting the resources of large massively parallel processing computers (MPP).
This increase in capability permits not only the continued inclusion of explicit
spin-orbit effects but now also a significant amount of non-dynamic and dynamic
correlation as is necessary for a good description of heavy-element systems. |
|
|
| S. J. Wright, "Recent Developments in
Interior-Point Methods," Proc. 19th IFIP TC7 Conference on System Modeling and
Optimization, Kluwer Academic Publishers (to appear 2000). Also Preprint ANL/MCS-P783-0999,
Sept. 1999. |
The modern era of interior-point methods dates
to 1984, when Karmarkar proposed his algorithm for linear programming. In the years
since then, algorithms and software for linear programming have become quite
sophisticated, while extensions to more general classes of problems, such as convex
quadratic programming, semidefinite programming, and nonconvex and nonlinear problems,
have reached varying levels of maturity. Interior-point methodology has been used as
part of the solution strategy in many other optimization contexts as well, including
analytic center methods and column-generation algorithms for large linear programs.
We review some core developments in the area and discuss their impact on these other
problem areas. |
|
|
| R. Cools and J. N. Lyness, "Three and
Four-Dimensional K-Optimal Lattice Rules of Moderate Trigonometric Degree,"
Math. Comp. 70, no. 236 (2001), 1549-1567. Also Preprint ANL/MCS-P784-1199,
Nov. 1999, revised 6/00. |
A systematic search has been undertaken for
optimal lattice rules of specified trigonometric degree d over the hypercube
[0,1)s. The search is restricted to a population K(s,\delta)
of lattice rules Q(\Lambda). This includes those where the dual lattice
\Lambda^\perp may be generated by s points h for each of which |h|
= \delta = d + 1. The underling theory, which suggests that such a
restriction might be helpful, is presented. The general character of the search is
described, and, for s = 3, d < 21, a list of K-optimal
rules is given. It is not known whether these are also optimal rules in the general
sense; this matter is discussed. |
|
|
| G. von Laszewski, M. L. Westbrook, C. Barnes, I.
Foster, E. M. Westbrook, "Using Computational Grid Capabilities to Enhance the
Capability of an X-Ray Source for Structural Biology," ANL/MCS-P785-1299,
Dec. 1999. |
The Advanced Photon Source at Argonne National
Laboratory enables structural biologists to perform state-of-the-art Crystallography
diffraction experiments with high-intensity X-rays. The data gathered during such
experiments is used to determine the molecular structure of macromolecules to enhance, for
example, the capabilities of modern drug design for basic and applied research. The
steps involved in obtaining a complete structure are computationally intensive and require
the proper adjustment of a considerable number of parameters that are not known a priori.
Thus, it is advantageous to develop a computational infrastructure for solving the
numerically complex problems quickly, in order to enable quasi-real-time information
discovery and computational steering. Specifically, we propose that the
time-consuming calculations be performed in a "computational grid" accessing a
large number of state-of-the-art computational facilities. Furthermore, we envision
that experiments could be conducted by researchers at their home institution via remote
steering while a beamline technician performs the actual experiment; such an approach
would be cost-efficient for the user. We conducted a case study involving multiple
tasks of a structural biologist, including data acquisition, data reduction, solution of
the phase problem, and calculation of the final result---an electron density map, which is
subsequently used for modeling of the molecular structure. We developed a parallel
program for the data reduction phase that reduces the turnaround time significantly.
We also disstributed the solution of the phase problem in order to obtain the
resulting electron density map more quickly. We used the GUSTOP testbed provided by
the Globus meteacomputing project as the source of the necessary state-of-the-art
computatioal resources, including workstation clusters. |
|
|