ACTS Toolkit Review
January 18-19, 2000
Oakland, California

Toolkit for Advanced Optimization

Mathematics and Computer Science Division
Argonne National Laboratory

Overview

The Toolkit for Advanced Optimization (TAO) focuses on the design of large scale optimization software, including nonlinear least squares, unconstrained minimization, bound constrained optimization, and general nonlinear optimization. The solution of such problems pervades many areas of computational science and demands robust and flexible solution strategies. Since many application problems require the computational power of high-performance computers, a need clearly exists for a uniform and flexible framework for developing software designed to solve large scale optimization problems.

Various software packages are available for solving these problems; however, their portability, versatility, and scalability are restricted within parallel environments. The current generation of numerical software generally has a rigid form that imposes many limitations, even when restricted to uniprocessor architectures. In traditional software design, the expressions of algorithms make assumptions about the way mathematical objects, such as vectors and matrices, are represented by the computer. Thus, users are forced to convert from the natural representation of data for a particular application to one imposed by the software developer, often at the expense of considerable overhead. In addition, library routines are often characterized by long and complicated calling sequences, with no consistent interface among algorithms that solve a particular class of problems. These issues are magnified by the very nature of multiprocessor architectures, since robust and efficient implementation of mathematical abstractions involves the added considerations of parallel data structures and communication. An effective software package should exploit different parallel programming techniques for various phases of the solution process.

The TAO design philosophy uses object-oriented techniques of data and state encapsulation and abstract classes to create a flexible optimization toolkit. The algorithms in the toolkit place strong emphasis on the reuse of external tools where appropriate. Our design enables bidirectional connection to lower level linear algebra support (e.g., parallel sparse matrix data structures) provided in toolkits such as PETSc, ISIS++, and SuperLU, as well as higher level application frameworks. Our design decisions are strongly motivated by the challenges inherent in the use of large-scale distributed memory architectures and the reality of working with large, often poorly structured legacy codes for specific applications.

Progress

Initial work in the TAO project centered on the development of a core library for optimization. We have made significant progress on developing implementations for the TAO components as summarized below:

In particular, the alpha version of the algorithms above will form the basis for the first release of TAO, planned for January 2000. We expect that this software release will lead to interactions with potential customers at DOE laboratories and elsewhere.

Benchmarks

In our work we have emphasized the benchmarking of the codes to make sure that performance is adequate for applications. In the results mentioned below we discuss results for two algorithms, the limited-memory variable metric algorithm (LMVM) and the gradient projection/conjugate gradient (GPCG) algorithm.

The limited-memory variable-metric algorithm (LMVM) requires only function and gradient information. This algorithm is especially useful for large scale problems when the Hessian matrix is either unavailable or too expensive to compute. In this algorithm the approximation to the Hessian matrix is not stored explicitly; instead, it is defined implicitly in terms of a fixed number of vectors.

We have benchmarked the LMVM algorithm on a minimal surface problem, which is a variational problem defined over a two-dimensional rectangular grid. In order to test the effectiveness of LMVM to solve large problems, we present sample results for a fixed-size problem with 2,560,000 variables that is run on varying numbers of processors of an IBM SP with 120 MHz P2SC nodes with two 128 MB memory cards each and a TB3 switch.


Processors Iterations Time (sec) Efficiency
4 1896 6360 100
8 1887 3160 100
16 1934 1602 99
32 1958 865 92
64 1853 423 94

These results show that LMVM has excellent parallel efficiency for large problems, where we achieve efficiency for this fixed-size problem in excess of 92% as processors range from four through sixty-four. At first sight it is surprising that the number of iterations varies with the number of processors. This behavior is to be expected from optimization algorithms that require a large number of iterations since optimization algorithms tend to be discontinuous with respect to input data. We note that the dominant operations in the LMVM algorithm involve vectors (e.g., AXPYs, dot products, norms); in particular, no matrix manipulations are needed since the Hessian matrix is not defined explicitly.

We have also benchmarked the GPCG algorithm on the journal bearing problem. We present sample results for a fixed-size problem with 2,560,000 variables on an IBM SP. The GPCG uses a gradient projection algorithm to predict the face where the solution lies, and a conjugate gradient algorithm to explore this face. Thus, at each iteration the GPCG algorithm must use a conjugate gradient algorithm to solve (approximately) a system of linear equations whose coefficient matrix is the submatrix of the Hessian of the quadratic with respect to the free variables for the current iterate. Since the free variables can change arbitrarily from iterate to iterate, solving these systems of linear equations may create a potential load-balancing bottleneck that must be addressed in an efficient implementation. In view of this description, our results below for GPCG are noteworthy in several ways.

Processors Faces Time (sec) Efficiency
8 46 3341 100
16 45 1781 94
32 46 928 90
64 46 522 80

First of all, the number of faces visited by GPCG is remarkably small. Other strategies can lead to a large number of gradient projection iterates, but the GPCG algorithm is remarkably efficient. Another interesting aspect is that because of the low memory requirements of iterative solvers, we are able to solve problems with over 2.5 million variables with only 8 processors. Strategies that rely on direct solvers are likely to need significantly more storage, and thus more processors. Finally, these results show that the GPCG implementation has excellent efficiency. For example, the efficiency of GPCG with respect to 8 processors ranges between 80% and 90%. This sustained efficiency is remarkable because the GPCG algorithm is solving a sequence of linear problems with a coefficient matrix set to the submatrix of the Hessian of the quadratic with respect to the free variables for the current iterate. Thus, our implementation's repartitioning of submatrices effectively deals with the load-balancing problem that is inherent in the GPCG algorithm or, more generally, active set methods. Additional details on these results can be found in the extended abstract.

We conclude our outline of progress in TAO by discussing interoperability between TAO software and external tools that provide linear algebra capabilities; we also discuss our progress in the development of OOQP.

Software Interoperability Issues

As mentioned in the overview, our design enables bidirectional connection to lower-level support (for example, parallel sparse matrix data structures, preconditioners, solvers) provided in toolkits such as PETSc, and thus we are able to build on top of these toolkits instead of having to redevelop code. The advantages in terms of development time are significant. The TAO optimization software components are written in a data-structure-neutral form that uses abstractions for vectors, matrices, and linear solvers. In particular, our current design for components (shown below) interfaces to the parallel matrix and vector objects provided by PETSc, as well as the linear equations solver (SLES) objects for the solutions of sparse linear equations with a variety of preconditioned Krylov methods.

As further discussed below, we plan that future versions of TAO will exploit the standardized sets of interfaces for linear algebra under development by the Equation Solver Interface forum, hence broadening the linear algebra support available.

OOQP

Our focus in OOQP is on implementation of an object-oriented interior-point code for convex quadratic programming problems. The algorithm and problem class lend themselves ideally to object-oriented design, since many subclasses of quadratic programming problems have highly specific structure that must be exploited for efficiency, whereas the workings of the interior-point algorithm (the choices of step length, centering parameter, etc.) are independent of this structure. It follows that the code objects that implement the higher levels of the algorithm can be reused across all applications, while the objects for solving the linear systems can be tailored to the structure of the application at hand. Typically, the linear-solve objects perform specialized block elimination operations that ultimately yield a general linear system (either symmetric or nonsymmetric), and standard linear algebra software can be called across the component interface to solve this reduced system. To date, we have implemented solvers for dense general quadratic programs, sparse general quadratic programs (using the SuperLU linear algebra package), and also the special quadratic programs that arise in Huber regression. We plan to build interfaces to other linear algebra solvers and to extend the range of special quadratic programs that can be handled efficiently in the OOQP framework to include linear-quadratic control problems and a variety of problems arising in statistical regression.

Future Work

Our plans during the next 12-18 months center on the release of the alpha version of TAO, and further development of the OOQP and NCO components of TAO. In summary our plans are:

  • Release the alpha version of TAO.
  • Develop OOQP (quadratic programming solvers).
  • Develop NCO (nonlinearly constrained solvers).
  • Evaluate alpha version of TAO on selected applications.
  • Continue interactions with the CCA forum.
  • Continue development of AD (automatic differentiation) tools for TAO.
  • Component-based Optimization Software

    TAO software components provide an ideal opportunity for testing and enhancing techniques that are under exploration by the DOE-wide Common Common Component Architecture (CCA) Forum, a group whose current membership is drawn from various DOE national laboratories and collaborating academic institutions and whose goal is to specify a component architecture for high-performance scientific computing. We have been actively involved in CCA discussions to assure that the CCA approach can serve the needs of TAO, and we plan to test evolving CCA ideas for specification of interactions among software tools. The diagram below shows how the components of TAO are likely to evolve.

    We also will experiment with the interface specifications for linear algebra under development within the Equation Solver Interface (ESI). Such interfacing capabilities offer the potential to dramatically improve the performance of TAO solvers by facilitating the use of newly emerging linear algebra algorithms and toolkits, in particular, parallel direct solvers and preconditioners for general indefinite systems.

    Interfacing with Automatic Differentiation Technology

    The use of abstractions for matrices and vectors in TAO also enables us to leverage automatic differentiation (AD) technology to facilitate the parallel computation of gradients and Hessians within optimization algorithms, and hence to provide a full range of support for scientific applications. We have demonstrated the viability of this approach through preliminary interfacing between parallel TAO solvers and the automatic differentiation tools ADIFOR and ADIC. For example, to support inexact Newton techniques for unconstrained minimization problems, we have begun to explore various AD approaches that assume the user simply provides code to compute the minimization function and optionally the gradient. Preliminary timing results using sparse AD of a user-provided gradient code via matrix coloring to compute a Hessian in a trust region Newton's method are given below for an elastic-plastic torsion model from the MINPACK-2 test suite. These results, which were run on an IBM SP for a problem of dimension 6400, indicate good scalability when problem sparsity is suitably exploited.

    Processors Time for Hessian Computation (sec)
    1 2.076
    2 1.060
    4 0.554
    8 0.281

    Future work will explore the development of TAO interfaces for special problem features (for example, partial separability, discretization stencil information) in automatic differentiation computations. We also will develop mechanisms for more fully automating the total solution process, which currently requires significant interaction with the user to specify dependent and independent variables, allocate work space, etc.

    Project Personnel

  • William Gropp
  • Lois Curfman McInnes
  • Jorge Moré
  • Barry Smith
  • Steve Wright
  • Postdoctoral

  • Steve Benson, University of Iowa, 1998.
  • Michael Gertz, University of California, San Diego, 1999.
  • Boyana Norris, University of Illinois at Urbana-Champaign, 1999.