mcs | publications | abstracts

1999 Abstracts of MCS Reports and Preprints


Reports

A. S. Bondarenko, D. M. Bortz, and J. J. More', "COPS: Large-Scale Nonlinearly Constrained Optimization Problems," Technical Memorandum ANL/MCS-TM-237, Sept. 1998, revised Oct. 1999. We have started the development of COPS, a collection of large-scale nonlinearly Constrained Optimization ProblemS.  The primary purpose of this collection is to provide difficult test cases for optimization software.   Problems in the current version of the collection come from fluid dynamics, population dynamics, optimal design, and optimal control. For each problem we provide a short description of the problem, notes on the formulation of the problem, and results of computational experiments with general optimization solvers.  We currently have results for DONLP2, LANCELOT, MINOS, SNOPT, and LOQO.
L. Freitag, "Users Manual for Opt-MS: Local Methods for Simplicial Mesh Smoothing and Untangling," Technical Memorandum ANL/MCS-TM-239, April 1999. Creating meshes containing good-quality elements is a challenging, yet critical, problem facing computational scientists today.  Several researches have shown that the size of the mesh, the shape of the elements within that mesh, and their relationship to the physical application of interest can profoundly affect the efficiency and accuracy of many numerical approximation techniques.  If the application contains anisotropic physics, the mesh can be improved by considering both local characteristics of the approximate application solution and the geometry of the computational domain.  If the application is isotropic, regularly shaped elements in the mesh reduce the discretization error, and the mesh can be improved a priori by considering geometric criteria only.  The Opt-MS package provides several local node point smoothing techniques that improve elements in the mesh by adjusting grid point location using geometric criteria.  The package is easy to use; only three subroutine calls are required for the user to begin using the software.  The package is also flexible; the user may change the technique, function, or dimension of the problem at any time during the mesh smoothing process.  Opt-MS is designed to interface with C and C++ codes, and examples for both two- and three-dimensional meshes are provided.

Preprints

H. G. Kaper, S. Tipei, and E. Wiebel, "Data Sonification and Sound Visualization," Computing in Science and Engineering 1(4) (July/Aug 1999), pp. 48-58.  Also Preprint ANL/MCS-P738-0199, Jan. 1999 This article describes a collaborative project between researchers in the Mathematics and Computer Science Division at Argonne National Laboratory and the Computer Music Project of the University of Illinois at Urbana-Champaign.  The project focuses on the use of sound for the exploration and analysis of complex data sets in scientific computing.  The article addresses digital sound synthesis in the context of DIASS, a Digital Instrument for Additive Sound Synthesis, and sound visualization in a virtual-reality environment by means of M4CAVE.   It describes the procedures and preliminary results of some experiments in scientific sonification and sound visualization.
W. Benger, I. Foster, J. Novotny, E. Seidel, J. Shalf, W. Smith, P. Walker, "Numerical Relativity in a Distributed Environment," Preprint ANL/MCS-P739-0199, Jan. 1999 The Cactus parallel simulation framework provides a modular and extensible set of components for solving relativity problems on parallel computes.  In recent work, we have investigated techniques that would enable the execution of Cactus applications in wide area "computational grid" environments.   In a first study, we investigated the feasibility of distributing a single simulation across multiple supercomputers, while in a second we studied techniques for reducing communication costs associated with remote visualization and steering.   Distributed simulation was achieved by using MPICH-G, an implementation of the Message Passing Interface standard that uses mechanisms provided by the Globus grid toolkit to enable wide area execution.  Experiments were performed across SGI Origins and Cray T3Es with geographical separations ranging from hundreds to thousands of kilometers.  Total execution time when distributed increased by between 48% and 133%, depending on configuration.  We view these results as encouraging as they were obtained with essentially no specialized algorithmic structures in the Cactus application.   Work on remote visualization focused on the development of a Cactus module that computes isosurfaces inline with numerical relativity calculations.  Experiments demonstrated that this technique can reduce network bandwidth requirements by a factor ranging from 2.5 to 114, depending on the nature of the problem.
Z. Ren, R. Sheng, and S. J. Wright, "Advanced Computational Techniques for Laue Diffraction," Preprint ANL/MCS-P740-0199, Feb. 1999. We describe LaueView, a code for processing the measured intensity data in Laue X-ray diffraction experiments to obtain corrected structure amplitudes for each reflection that take account of the various distortion effects.  The resulting corrected intensity data can then be used to recover the molecular structure by isomorphous refinement or by solution of the phase problem.   We describe the key numerical techniques used in LaueView and outline the improvements we made to obtain a new, more efficient, and parallel version of the code.   We conclude with some computational results obtained on a real data set that illustrate our improvements.  The basic principles of the Laue method are described in an appendix, where we outline the distortions in the measured intensity data due to effects such as blurring, overlap of the spots, the nonuniform distribution of intensities in the incident X-ray beam, and absorption effects of various types.
R. Thakur, W. Gropp, and E. Lusk, "Optimizing Noncontiguous Accesses in MPI-IO," Parallel Computing 28 (2002), pp. 83-105.  Also Preprint ANL/MCS-P742-0299, Feb. 1999, revised Dec. 2000. The I/O access patterns of many parallel applications consist of accesses to a large number of small, noncontiguous pieces of data.   If an application's I/O needs are met by making many small, distinct I/O requests, however, the I/O performance degrades drastically.  to avoid this problem, MPI-IO allows users to access noncontiguous data with a single I/O function call, unlike in Unix I/O.  In this paper, we explain how critical this feature of MPI-IO is for high performance and how it enables implementations to perform optimizations.  An application can be written in many different ways with MPI-IO.  We classify the different ways of expressing an application's I/O access pattern in MPI-IO into four levels, called level 0 through level 3.  We demonstrate that, for applications with noncontiguous access patterns, the I/O performance improves significantly if users write the application such that it makes level-3 MPI-IO requests (noncontiguous, collective) rather than level-0 requests (Unix style).  We describe how our MPI-IO implementation, ROMIO, delivers high performance for noncontiguous requests.  We explain in detail the two key optimizations ROMIO performs: data sieving for noncontiguous requests from one process and collective I/O for noncontiguous requests from multiple processes.  We describe how one can implement these optimizations portably on multiple machines and file systems, control their memory requirements, and also achieve high performance.  We demonstrate the performance and portability with performance results for three applications---an astrophysics-application template (DIST3D), the NAS BTIO benchmark, and an unstructured code (UNSTRUC)---on five different parallel machines:   HP Exemplar, IBM SP, Intel Paragon, NEC SX-4, and SGI Origin2000.
L. Freitag and T. Urness, "Analyzing Industrial Furnace Efficiency Using Comparative Visualization in a Virtual Reality Environment," Preprint ANL/MCS-P744-0299, Feb. 1999. We describe an interactive toolkit used to perform comparative analysis of two or more data sets arising from numerical simulations.   Several techniques have been incorporated into this toolkit, including (1) successive visualization of individual data sets, (2) data comparison techniques such as computation and visualization of the differences between data sets, and (3) image comparison methods such as scalar field height profiles plotted in a common coordinate system.  We describe each technique in detail and show example usage in an industrial application aimed at designing an efficient, low-NOx burner for industrial furnaces.   Critical insights are obtained by interactively adjusted color maps, data culling, and data manipulation.  New paradigms for scaling small values in the data comparison technique are described.  The display device used for this application was the CAVE virtual reality theater, and we describe the user interface to the visualization toolkit and the benefits of immersive 3D visualization for comparative analysis.
R. Overbeek, M. Fonstein, M. D'Souza, G. D. Pusch, and N. Maltsev, "Use of Contiguity on the Chromosome to Predict Functional Coupling," In Silico Biology 1 (1999), pp. 93-108.   Also Preprint ANL/MCS-P746-0299, Feb. 1999. The availability of a growing number of completely sequenced genomes opens new opportunities for understanding of complex biological systems.   Success of genome-based biology will, to a large extent, depend on the development of new approaches and tools for efficient comparative analysis of the genomes and their organization.  We have developed a technique for detecting possible functional coupling between genes based on detection of potential operons.  The approach involves computation of "pairs of close bi-directional best hits," which are pairs of genes that apparently occur within operons in multiple genomes.  Using these pairs, one can compose evidence (based on the number of distinct genomes and the phylogenetic distance between the orthologous pairs) that a pair of genes is potentially functionally coupled.  The technique has revealed a surprisingly rich and apparently accurate set of functionally coupled genes.  The approach depends on the use of a relatively large number of genomes, and the amount of detected coupling grows dramatically as the number of genomes increases.
H. G. Kaper and S. Tipei, "An Abstract Approach to Music," Preprint ANL/MCS-P748-0399, March 1999. The notion of formalized music implies that a musical composition can be described in mathematical terms.  In this article we explore some formal aspects of music and propose a framework for an abstract approach.
L. A. Freitag and P. Plassmann, "Local Optimization-Based Simplicial Mesh Untangling and Improvement," Preprint ANL/MCS-P749-0399, May 1999. We develop an optimization-based approach for mesh untangling formulated by maximizing the minimum area or volume of the simplicial elements in a local submesh.  The method that the function level sets are convex regardless of the position of the free vertex, and hence the local subproblem is guaranteed to converge.   Maximizing the minimum area of mesh elements, although well-suited for mesh untangling, is not ideal for mesh improvement and its use often results in poor quality meshes.  We therefore combine the mesh untangling technique with optimization-based mesh improvement techniques, and expand previous results to show that the level sets for a commonly used mesh quality criterion is convex in the feasible region.  Typical results showing the effectiveness of the combined untangling and smoothing techniques are given for both two- and three-dimensional meshes.
J. Bresnahan, I. Foster, J. Insley, B. Toonen, S. Tuecke, "Communication Services for Advanced Network Applications," Preprint ANL/MCS-P750-0599, May 1999. Advanced network applications such as remote instrument control, collaborative environments, and remote I/O are distinguished from "traditional" applications such as videoconferencing by their need to create multiple, heterogeneous flows with different characteristics.  For example, a single application may require remote I/O for raw datasets, shared controls for a collaborative analysis system, streaming video for image rendering data, and audio for collaboration.   Furthermore, each flow can have different requirements in terms of reliability, network quality of service, security, etc.  We argue that new approaches to communication services, protocols, and network architecture are required both to provide high-level abstractions for common flow types and to support user-level management of flow creation and quality.  We describe experiences with the development of such applications and communication services.
L. A. Freitag, W. D. Gropp, P. D. Hovland, L. C. McInnes, and B. F. Smith, "Infrastructure and Interfaces for Large-Scale Numerical Software,"  Preprint ANL/MCS-P751-0599, May 1999. The complexity of large-scale scientific simulations often necessitates the combined use of multiple software packages developed by different groups in areas such as adaptive mesh manipulations, scalable algebraic solvers, and optimization.  Historically, these packages have been combined by using custom code.  This practice inhibits experimentation with and comparison of multiple tools that provide similar functionality through different implementations. The ALICE project, a collaborative effort among researchers at Argonne National Laboratory, is exploring the use of component-based software engineering to provide better interoperability among numerical toolkits. We discuss some initial experiences in developing an infrastructure and interfaces for high-performance numerical computing.
L. A. Freitag and R. M. Loy, "Adaptive, Multiresolution Visualization of Large Data Sets Using Parallel Octrees," Preprint ANL/MCS-P752-0599, May 1999. The interactive visualization and exploration of large scientific data sets is a challenging and difficult task; their size often far exceeds the performance and memory capacity of even the most powerful graphics workstations.  To address this problem, we have created a technique that combines hierarchical data reduction methods with parallel computing to allow interactive exploration of large data sets while retaining full-resolution capability.  The hierarchical representation is built in parallel by strategically inserting field data into an octree data structure.  We provide functionality that allows the user to interactively adapt the resolution of the reduced data sets so that resolution is increased in regions of interest without sacrificing local graphics performance.  We describe the creation of the reduced data sets using a parallel octree, the software architecture of the system, and the performance of this system on the data from a Rayleigh-Taylor instability simulation.
J. Cownie and W. Gropp, "A Standard Interface for Debugger Access to Message Queue Information in MPI," Preprint ANL/MCS-P754-0699, June 1999. This paper discusses the design and implementation of an interface that allows a debugger to obtain the information necessary to display the contents of the MPI message queues.  The design has been implemented in the TotalView debugger, and dynamic libraries that conform to the interface exist for MPICH, as well as the proprietary MPI implementations from Compaq, IBM, and SGI.
W. Gropp and E. Lusk, "Reproducible Measurements of MPI Performance Characteristics, Preprint ANL/MCS-P755-0699, June 1999. In this paper we describe the difficulties inherent in making accurate, reproducible measurements of message-passing performance.   We describe some of the mistakes often made in attempting such measurements and the consequences of such mistakes.  We describe mpptest, a suite of performance measurement programs developed at Argonne National Laboratory, that attempts to avoid such mistakes and obtain reproducible measures of MPI performance that can be useful to both MPI implementors and MPI application writers.  We include a number of illustrative examples of its use.
M. Gomberg, C. Stacey, and J. Sayre, "Scalable, Remote Administration of Windows NT," Preprint ANL/MCS-P756-0699, June 1999. In the UNIX community there is an overwhelming perception that NT is impossible to manage remotely and that NT administration doesn't scale.  This was essentially true with earlier versions of the operating system.   Even today, out of the box, NT is difficult to manage remotely.  Many tools, however, now make remote management of NT not only possible, but under some circumstances very easy.  In this paper we discuss how we at Argonne's Mathematics and Computer Science Division manage all our NT machines remotely from a single console, with minimum locally installed software overhead.  We also present NetReg, which is a locally developed tool for scalable registry management.  NetReg allows us to apply a registry change to a specified set of machines.  It is a command line utility that can be run in either interactive or batch mode and is written in Perl for Win32, taking heavy advantage of the Win32::TieRegistry module.
G. von Laszewski and I. Foster, "Grid Infrastructure to Support Science Portals for Large Scale Instruments," Preprint ANL/MCS-P757-0699, June 1999. Soon, a new generation of scientific workbenches will be developed as a collaborative effort among various research institutions in the United States.  These scientific workbenches will be accesses in the Web via portals.   Reusable components are needed to build such portals for different scientific disciplines, allowing uniform desktop access to remove resources.  Such components will include tools and services enabling easy collaboration, job submission, job monitoring, component discovery, and persistent object storage.  Based on experience gained from Grand Challenge applications for large-scale instruments, we demonstrate how Grid infrastructure components can be used to support the implementation of science portals.  The availability of these components will simplify the prototype implementation of a common portal architecture.
S. Bittner, "Selecting and Implementing the PBS Scheduler on an SGI Onyx 2/Origin 2000," Preprint ANL/MCS-P758-0699, June 1999. In the Mathematics and Computer Science Division at Argonne National Laboratory, the demand for resources on the Onyx 2 exceeds the resources available for consumption.  To distribute these scarce resources effectively, we need a scheduling and resource management package with multiple capabilities.  In particular, it must accept standard interactive user logins, allow batch jobs, backfill the system based on available resources, and permit system activities such as accounting to proceed without interruption.  The package must include a mechanism to treat the graphic pipes as a schedulable resource.  Also required is the ability to create advance reservations, offer dedicated system modes for large resource runs and benchmarking, and track the resources consumed for each job run.   Furthermore, our users want to be able to obtain repeatable timing results on job runs.  And, of course, package costs must be carefully considered.  We explored several options, including NQE and various third-party products, before settling on the PBS scheduler.
R. Armstrong, D. Gannon, A. Geist, K. Keahey, S. Kohn, L. McInnes, S. Parker, and B. Smolinski, "Toward a Common Component Architecture for High-Performance Scientific Computing," Preprint ANL/MCS-P759-0699, June 1999. This paper describes work in progress to develop a standard for interoperability among high-performance scientific components.  This research stems from growing recognition that the scientific community needs to better manage the complexity of multidisciplinary simulations and better address scalable performance issues on parallel and distributed architectures.  Driving forces are the need for fast connections among components that perform numerically intensive work and for parallel collective interactions among components that use multiple processes or threads.   This paper focuses on the areas we believe are most crucial in this context, namely, an interface definition language that supports scientific abstractions for specifying component interfaces and a ports connections model for specifying component interactions.
M. Anitescu, "On the Rate of Convergence of Sequential Quadratic Programming with Nondifferentiable Exact Penalty Function in the Presence of Constraint Degeneracy," Math. Program. Ser. A 92 (2002), pp. 359-386.  Also Preprint ANL/MCS-P760-0699, June 1999. We analyze the convergence of the sequential quadratic programming (SQP) method for nonlinear programming for the case in which the Jacobian of the active constraints is rank deficient at the solution and/or strict complementarity does not hold for some or any feasible LaGrange multipliers.  We use a nondifferentiable exact penalty function, and we prove that the sequence generated by the SQP is locally R-linearly convergent if the matrices of the quadratic program are uniformly positive definite and bounded, provided that the Mangasarian-Fromowitz constraint qualification and some second-order sufficiency conditions hold.
M. Anitescu, "Degenerate Nonlinear Programming with a Quadratic Growth Condition," SIAM J. Optim. 10(4) (2000), pp. 116-1135.  Also Preprint ANL/MCS-P761-0699, June 1999. We show that the quadratic growth condition and the Mangasarian-Fromowitz constraint qualification imply that local minima of nonlinear programs are isolated stationary points. As a result, when started sufficiently close to such points, an L_\infty exact penalty sequential quadratic programming algorithm will induce at least R-linear convergence of the iterates to such a local minimum.  We construct an example of a degenerate nonlinear program with a unique local minimum satisfying the quadratic growth and the Mangasarian-Fromowitz constraint qualification but for which no positive semidefinite augmented Lagrangian exists.  We present numerical results obtained using several nonlinear programming packages on this example, and discuss its implications for some algorithms.
H. M. Tufo and P. F. Fischer, "Terascale Spectral Element Algorithms and Implementations," Preprint ANL/MCS-P762-0699, June 1999. We describe the development and implementation of an efficient spectral element code for multimillion gridpoint simulations of incompressible flows in general two- and three-dimensional domains.  We review basic and recently developed algorithmic underpinnings that have resulted in good parallel and vector performance on a broad range of architectures, including the terascale computing systems now coming on line at the DOE labs.  Sustained performance of 219 GFLOPS has been recently achieved on 2048 nodes of the Intel ASCI-Red machine at Sandia.
O. Zaki, E. Lusk, W. Gropp, and D. Swider, "Toward Scalable Performance Visualization with Jumpshot," Int'l J. of High-Performance Computing Applications (to appear).  Also Preprint ANL/MCS-P763-0699, June 1999. Jumpshot is a graphical tool for understanding the performance of parallel programs.  It is in the tradition of the upshot tool, but contains a number of extensions and enhancements that make it suitable for large-scale parallel computations.  Jumpshot takes as input a new, more flexible logfile format, and comes with a library for generating such logfiles.  An MPI profiling library is also included, enabling the automatic generation of such logfiles from MPI programs. Jumpshot is written in Java, and can easily be integrated as an applet into browser-based computing environments.  The most novel feature of Jumpshot is its automatic detection of anomalous durations, drawing the user's attention to problem areas in a parallel execution.  This capability is particularly useful in large-scale parallel computations containing very many events.
G. W. Crabtree, D. O. Gunter, H. G. Kaper, A. E. Koshelev, G. K. Leaf, and V. M. Vinokur, "Numerical Simulations of Driven Vortex Systems," Physical Review B 61, no. 2 (Jan. 2000), pp. 1446-1455.  Also Preprint ANL/MCS-P764-0699, June 1999. This article reports on several large-scale numerical simulations of vortex systems that are driven through superconducting media with defects.  The simulations are based on the time-dependent Ginzburg-Landau equations.   The simulations demonstrate regimes of plastic and elastic steady-state motion in the presence of a twin boundary, show the effect of regular and irregular arrays of point defects on vortex trajectories, and show a mechanism by which vortices move through an array of columnar defects.  Also presented are the results of some transient simulations in two and three dimensions, which show that, in the transition from the Meissner state to the vortex state, vortices are formed by a process of deposition.
D. O. Gunter, H. G. Kaper, and G. K. Leaf, "Implicit Integration of the Time-Dependent Ginzburg-Landau Equations of Superconductivity," SIAM J. Sci. Comput. 23. no. 9 (2002), pp. 1944_1959.  Also Preprint ANL/MCS-P765-0699, June 1999, revised July 2000. This article is concerned with the integration of the time-dependent Ginzburg-Landau (TDGL) equations of superconductivity.  Four algorithms, ranging from fully explicit to fully implicit, are presented and evaluated for stability, accuracy, and compute time.  The benchmark problem for he evaluation is the equilibration of a vortex configuration in a superconductor that is embedded in a thin insulator and subject to an applied magnetic field.
M. K. Mihçak, P. Moulin, M. Anitescu, and K. Ramchandran, "Rate-Distortion-Optimal Subband Coding without Perfect-Reconstruction Constraints," Preprint ANL/MCS-P766-0699, June 1999. We investigate the design of subband coders without the traditional perfect-reconstruction constraint on the filters.  The coder uses scalar quantizers, and its filters and bit allocation are designed so as to optimize a rate-distortion criterion.  Convexity properties play a central role in the analysis.  Our results hold for a broad class of rate-distortion criteria.   First, we show that optimality can be achieved using filter banks that are the cascade of a (paraunitary) principal component filter bank for the input spectral process and a set of pre- and post-filters surrounding each quantizer.  An algorithm for computing the globally optimal filters and bit allocation is given.  We then develop closed-form solutions for the special case of two-channel coders under an exponential rate-distortion model.  Finally, we investigate a constrained-length version of the filter design problem, which is applicable to practical coding scenarios.  While the optimal filter banks are nearly perfect-reconstruction at high rates, we demonstrate an apparently surprising advantage of optimal FIR filter banks:  they significantly outperform optimal perfect-reconstruction FIR filter banks at all bit rates.
J. G. Michalakes, R. J. Purser, and R. Swinbank, "Data Structure and Parallel Decomposition Considerations on a Fibonacci Grid," Proc. Numer. Weather Prediction Conf., Denver, 9/13-17, 1999 (to appear).  Also Preprint ANL/MCS-P767-0699. The Fibonacci grid, proposed by Swinbank and Purser (1999), provides attractive properties for global numerical atmospheric prediction by offering an optimally homogeneous, geometrically regular, and approximately isotropic discretization, with (as always) only the polar regions requiring special numerical treatment.  It is a mathematical idealization, applied to the sphere, of the multi-spiral patterns often found in botanical structures, such as in pine cones and sunflower heads.  Computationally, it is natural to organize the domain into zones, in each of which the same pair (or possibly, triplet) of Fibonacci spirals dominate. But the further subdivision of such zones into tiles of a shape and size suitable for distribution to the processors of a massively parallel computer requires careful consideration if the subsequent spatial computations along the respective spirals, especially those computations (such as compact differencing schemes) that involve recursion, can be implemented in an efficient, load-balanced manner without requiring excessive amounts of inter-processor communications.  Even when traditional low-order explicit methods are used, care must still be taken.  We show how certain simple properties of the Fibonacci sequence (whose numbers prescribe the multiplicities of the members of the successive families of spirals that make up the grid) may be exploited in the subdivision of grid zones into arrangements of triangular grid tiles, each with two sides coincident with grid lines and a ragged "hypotenuse" approximately aligned east-west.
S. J. Benson, L. C. McInnes, and J. J. Moré, "GPCG: A Case Study in the Performance and Scalability of Optimization Algorithms," Preprint ANL/MCS-P768-0799, Sept. 2000. GPCG is an algorithm within the Toolkit for Advanced Optimization (TAO) for solving bound constrained, convex quadratic problems.   Originally developed by Moré and Toraldo, this algorithm was designed for large-scale problems but had been implemented only for a single processor.  The TAO implementation is available for a wide range of high-performance architecture, and has been tested on up to 64 processors to solve problems with over 2.5 million variables.
L. A. Freitag and P. M. Knupp, "Tetrahedral Element Shape Optimization via the Jacobian Determinant and Condition Number," Preprint ANL/MCS-P769-0799, July 1999. We present a new shape measure for tetrahedral elements that is optimal in the sense that it gives the distance of a tetrahedron from the set of inverted elements.  This measure is constructed from the condition number of the linear transformation between a unit equilateral tetrahedron and any tetrahedron with positive volume.  We use this shape measure to formulate two optimization objective functions, which are differentiated by their goal: the first seeks to improve the average quality of the tetrahedral mesh, the second aims to improve the worst quality element in the mesh.  Because the element condition number is not defined for tetrahedra with negative volume, these objective functions can be used only when the initial mesh is valid.  Therefore, we formulate a third objective function using the determinant of the element Jacobian that is suitable for mesh untangling.  We review the optimization techniques used with each objective function, and present experimental results that demonstrate the effectiveness of the mesh improvement and untangling methods.   We show that a combined optimization approach that uses both condition number objective functions obtains the best quality meshes.
H. G. Kaper and S. Tipei, "Formalizing the Concept of Sound," Preprint ANL/MCS-P771-0799, July 1999. The motion of formalized music implies that a musical composition can be described in mathematical terms.  In this article we explore some formal aspects of music and propose a framework for an abstract approach.
S. J. Wright and D. Orban, "Properties of the Log-Barrier Function on Degenerate Nonlinear Programs," Preprint ANL/MCS-P772-0799, July 1999. We examine the sequence of local minimizers of the log-barrier function for a nonlinear program near a solution at which second-order sufficient conditions and the Mangasarian-Fromovitz constraint qualifications are satisfied, but the active constraint gradients are not necessarily linearly independent.   When a strict complementarity condition is satisfied, we show uniqueness of the local minimizer of the barrier function in the vicinity of the nonlinear program solution, and obtain a semi-explicit characterization of this point.  When strict complementarity does not hold, we obtain several other interesting characterizations, in particular, an estimate of the distance between the minimizers of the barrier function and the nonlinear program in terms of the barrier parameter, and a result about the direction of approach of the sequence of minimizers of the barrier function to the nonlinear programming solution.
J. E. Flaherty, R. M. Loy, M. S. Shephard, and J. D. Teresco, "Software for the Parallel Adaptive Solution of Conservation Laws by Discontinuous Galerkin Methods," Preprint ANL/MCS-P773-0799, July 1999. We develop software tools for the solution of conservation laws using parallel adaptive discontinuous Galerkin methods.  In particular, the Rensselaer Partition Model (RPM) provides parallel mesh structures within an adaptive framework to solve the Euler equations of compressible flow by a discontinuous Galerkin method (LOCO).  Results are presented for a Rayleigh-Taylor flow instability for computations performed on 128 processors of an IBM SP computer.  In addition to managing the distributed data and maintaining a load balance, RPM provides information about the parallel environment that can be used to tailor partition to a specific computational environment.
H. M. Tufo, P F. Fischer, M. E. Papka, and M. Szymanski, "Hairpin Vortex Formation, a Case Study for Unsteady Visualization," Preprint ANL/MCS-P774-0799, July 1999. To better understand the vortex dynamics of coherent structures in turbulent and transitional boundary layers, we consider direct numerical simulation of the interaction between a flat-plate-boundary-layer flow and an isolated hemispherical roughness element.  Of principal interest is the evolution of hairpin vortices that form an interlacing pattern in the wake of the hemisphere, lift away from the wall, and are stretched by the shearing action of the boundary layer.  Using animations of unsteady three-dimensional representations of this flow, produced by the vtk toolkit and enhanced to operate in a CAVE virtual environment, we identify and study several key features in the evolution of this complex vortex topology not previously observed in other visualization formats.
W. McCune and O. Shumsky, "Ivy: A Preprocessor and Proof Checker for First-Order Logic," in Computer-Aided Reasoning: ACL2 Case Studies, M. Kaufmann, P. Manolios, J Moore, eds., Kluwer Academic Publishers (2000).  Also Preprint ANL/MCS-P775-0899, Aug. 1999. This case study shows how non-ACL2 programs can be combined with ACL2 functions in such a way that useful properties can be proved about the composite programs.  Nothing is proved about the non-ACL2 programs.  Instead, the results of the non-ACL2 programs are checked at run time by ACL2 functions, and properties of these checker functions are proved.  The application is resolution/paramodulation automated theorem proving for first-order logic.  The top ACL2 function takes a conjecture, preprocesses the conjecture, and calls a non-ACL2 program to search for a proof or countermodel.  If the non-ACL2 program succeeds, ACL2 functions check the proof or countermodel.  The top ACL2 function is proved sound with respect to finite interpretations.
W. K. Anderson, W. D. Gropp, D. K. Kaushik, D. E. Keyes, and B. F. Smith, "Achieving High Sustained Performance in an Unstructured Mesh CFD Application," Preprint ANL/MCS-P776-0899, Aug. 1999. Overview: Many applications of economic and national security importance require the solution of nonlinear partial differential equations (PDEs).  In many cases, PDEs possess a wide range of time scales---some (e.g., acoustic) faster than the phenomena of prime interest (e.g., convective), suggesting the need for implicit methods.  In addition, many applications are geometrically complex and possess a wide range of length scales, requiring an unstructured mesh to adequately resolve the problem without requiring an excessive number of mesh points and to accomplish mesh generation and adaptation (almost) automatically.  The best algorithms for solving nonlinear implicit problems are often Newton Methods, which themselves require the solution of very large, sparse linear systems.  The best algorithms for these sparse linear problems, particularly at very large sizes, are often preconditioned iterative methods.  This nested hierarchy of tunable algorithms has proved effective in solving complex problems in areas such as aerodynamics, combustion, radiation transport, and global circulation.  Typically, for steady-state solutions from a trivial initial guess, the number of "work units" (evaluations of the discrete residuals on the finest mesh on which the problem is represented) is around 10**3 (to achieve reductions in the norm of the residual of 10**-8 to 10**-12).  Although these algorithms are efficient (in the sense of using relatively few floating-point operations to arrive at the final result), they do not necessarily achieve the absolute flops-per-second (flop/s) ratings that less efficient or less versatile algorithms may.
P. F. Fischer and J. S. Mullen, "Filter-based Stabilization of Spectral Element Methods," C. R. Acad. Sci. Paris 332(1) (2001) 265-270.  Also Preprint ANL/MCS-P777-0899, Aug. 1999. We present a simple filtering procedure for stabilizing the spectral element method (SEM) for the unsteady advection-diffusion and Navier-Stokes equations.  A number of example applications are presented, along with basic analysis for the advection-diffusion case.
P. F. Fischer and H. M. Tufo, "High-Performance Spectral Element Algorithms and Implementations," Preprint ANL/MCS-P778-0899, Aug. 1999. We describe the development and implementation of a spectral element code for multimillion gridpoint simulations of incompressible flows in general two- and three-dimensional domains.  Parallel performance is present on up to 2048 nodes of the Intel ASCI-Red machine at Sandia.
H  M. Tufo, P. F. Fischer, M. E. Papka, and K. Blom, "Numerical Simulation and Immersive Visualization of Hairpin Vortices," Preprint ANL/MCS-P779-0899, Aug. 1999. To better understand the vortex dynamics of coherent structures in turbulent and transitional boundary layers, we consider direct numerical simulation of the interaction between a flat-plate-boundary-layer flow and an isolated hemispherical roughness element.  Of principal interest is the evolution of hairpin vortices that form an interlacing pattern in the wake of the hemisphere, lift away from the wall, and are stretched by the shearing action of the boundary layer.  Using animations of unsteady three-dimensional representations of this flow, produced by the vtk toolkit and enhanced to operate in a CAVE virtual environment, we identify and study several key features in the evolution of this complex vortex topology not previously observed in other visualization formats.
J. L. Tilson, R. Shepard, C. Naleway, A. F. Wagner, and W. C. Ermler, "Ab Initio Determination of Americium Ionization Potentials," Preprint ANL/MCS-P780-0899, Aug. 1999. The first three ionization potentials of americium are calculated using ab initio spin-orbit configuration interaction techniques.   These results are favorably compared to available experimental and previous theoretical work.  The lowest two ionization potentials are accurately determined using wave functions constructed as simple single and double substitutions from a self-consistent field reference configuration with scalar relativistic effects included through an averaged relativistic pseudopotential.  A determination of the third ionization potential to comparable accuracy requires inclusion of the spin-orbit operator and significant intermediate coupling with a resulting configuration expansion length in excess of 1.9 million double-group adapted functions.  The solution to this problem was achieved by application of a new parallel spin-orbit configuration interaction component to the COLUMBUS Program System.  A decomposition of the ionization potential calculation into parts either sensitive or largely insensitive to the spin-orbit operator is favorably tested, leading to hybrid calculations of improved accuracy.
M. D'Souza, M. F. Romine, and N. Maltsev, "SENTRA, a Database of Signal Transduction Proteins," Nucleic Acids Research (to appear Jan. 2000).  Also Preprint ANL/MCS-P781-0999, Sept. 1999.

 

SENTRA is a a database of proteins associated with microbial signal transduction.  The database currently includes the classical two-component signal transduction pathway proteins and methyl-accepting chemotaxis proteins, but will be expanded to also include other classes of signal transduction systems that are modulated by phosphorylation or methylation reactions.  Although the majority of database entries are from prokaryotic systems, eukaroytic proteins with bacterial-like signal transduction domains are also included.  Currently SENTRA contains signal transduction proteins in 34 complete and almost completely sequenced prokaryotic genomes, as well as sequences from 243 organisms available in public databases (SwissProt and EMBL).  The analysis was carried out within the framework of the WIT2 system, which is designed and implemented to support genetic sequence analysis and comparative analysis of sequenced genomes.
J. L. Tilson, W. C. Ermler, and R. M. Pitzer, "Parallel Spin-Orbit Coupled Configuration Interaction," Computer Physics Communications 128 (2000), pp. 128-138.  Also Preprint ANL/MCS-P782-0999, Sept. 1999. A parallel spin-orbit configuration interaction (SOCI) code has been developed.  This code, named P-SOCI, is an extension of an existing sequential SOCI program and permits solutions to heavy-element systems requiring both explicit spin-orbit (SO) effects and significant electron correlation.  The relativistic procedure adopted here is an ab initio conventional configuration interaction (CI) method that constructs a Hamiltonian matrix in a double-group-adapted basis.  P-SOCI enables solutions to problems far larger than possible with the original code by exploiting the resources of large massively parallel processing computers (MPP).  This increase in capability permits not only the continued inclusion of explicit spin-orbit effects but now also a significant amount of non-dynamic and dynamic correlation as is necessary for a good description of heavy-element systems.
S. J. Wright, "Recent Developments in Interior-Point Methods," Proc. 19th IFIP TC7 Conference on System Modeling and Optimization, Kluwer Academic Publishers (to appear 2000).  Also Preprint ANL/MCS-P783-0999, Sept. 1999. The modern era of interior-point methods dates to 1984, when Karmarkar proposed his algorithm for linear programming.  In the years since then, algorithms and software for linear programming have become quite sophisticated, while extensions to more general classes of problems, such as convex quadratic programming, semidefinite programming, and nonconvex and nonlinear problems, have reached varying levels of maturity.  Interior-point methodology has been used as part of the solution strategy in many other optimization contexts as well, including analytic center methods and column-generation algorithms for large linear programs.  We review some core developments in the area and discuss their impact on these other problem areas.
R. Cools and J. N. Lyness, "Three and Four-Dimensional K-Optimal Lattice Rules of Moderate Trigonometric Degree," Math. Comp. 70, no. 236 (2001), 1549-1567.  Also Preprint ANL/MCS-P784-1199, Nov. 1999, revised 6/00. A systematic search has been undertaken for optimal lattice rules of specified trigonometric degree d over the hypercube [0,1)s.  The search is restricted to a population K(s,\delta) of lattice rules Q(\Lambda).  This includes those where the dual lattice \Lambda^\perp may be generated by s points h for each of which |h| = \delta = d + 1.  The underling theory, which suggests that such a restriction might be helpful, is presented.  The general character of the search is described, and, for s = 3, d < 21, a list of K-optimal rules is given.  It is not known whether these are also optimal rules in the general sense; this matter is discussed.
G. von Laszewski, M. L. Westbrook, C. Barnes, I. Foster, E. M. Westbrook, "Using Computational Grid Capabilities to Enhance the Capability of an X-Ray Source for Structural Biology," ANL/MCS-P785-1299, Dec. 1999. The Advanced Photon Source at Argonne National Laboratory enables structural biologists to perform state-of-the-art Crystallography diffraction experiments with high-intensity X-rays.  The data gathered during such experiments is used to determine the molecular structure of macromolecules to enhance, for example, the capabilities of modern drug design for basic and applied research.  The steps involved in obtaining a complete structure are computationally intensive and require the proper adjustment of a considerable number of parameters that are not known a priori.   Thus, it is advantageous to develop a computational infrastructure for solving the numerically complex problems quickly, in order to enable quasi-real-time information discovery and computational steering.  Specifically, we propose that the time-consuming calculations be performed in a "computational grid" accessing a large number of state-of-the-art computational facilities.  Furthermore, we envision that experiments could be conducted by researchers at their home institution via remote steering while a beamline technician performs the actual experiment; such an approach would be cost-efficient for the user.  We conducted a case study involving multiple tasks of a structural biologist, including data acquisition, data reduction, solution of the phase problem, and calculation of the final result---an electron density map, which is subsequently used for modeling of the molecular structure.  We developed a parallel program for the data reduction phase that reduces the turnaround time significantly.   We also disstributed the solution of the phase problem in order to obtain the resulting electron density map more quickly.  We used the GUSTOP testbed provided by the Globus meteacomputing project as the source of the necessary state-of-the-art computatioal resources, including workstation clusters.
[MCS | Research | Resources | People | Collaboration | Software | Publications | Information]