Plans for Future Development
What is it that separates OpenAD as it is from the close-to-ideal
We see the following five areas where further effort is required to
OpenAD to the desired level of robustness, efficiency, and ease of use.
Priorities need to be set according to the requirements of our primary
and the level of available funding. The last statement probably applies
every large-scale software project based in an environment that is
characterized by limited resources - both human and financial.
Many other important issues, such as automatic sparsity detection,
parallel checkpointing, further AD-specific data-flow analyses, are
investigated by various other groups. We intent to evaluate their
for a possible inclusion in the pool of algorithms provided by OpenAD.
- Strategies for adjoint code
- pure adjoint code
(no preaccumulation) for efficient gradient computation:
Preaccumulation is advantageous if the adjoint model is supposed to be
many times at the same point, for example, to accumulate the full
for a problem with a moderate number of dependent outputs and a very
number of independent inputs. This situation does not apply to the
evaluation of a single gradient. Adjoint code in the classical sense is
likely to perform
- preaccumulation of
adjoint-augmented statement-graphs for efficient gradient computation:
At the level of single scalar assignments improvements over 1.1 can be
by considering the computation of the local gradients as a vertex
elimination problem in the adjoint-augmented computational graph. We
know how to solve
this problem. The result can be used to generate adjoint code that
less floating-point operations (flops) than the code generated by the
decisions about local code generation strategy by the adjoint compiler:
In items 1.1 and 1.2 we strive for a low number of flops. Another
relevant measure of complexity is the amount of memory required to
store the values that are used by the adjoint computation. For certain
bits of code it may be advantageous to preaccumulate the local Jacobian
and store all
its entries instead of storing the intermediate values that are
to compute this local Jacobian. This decision can be made by the
adjoint compiler based on its internal representation of the code and
of data-flow analyses (see point 2 below).
- AD-specific data-flow analyses
(flow-sensitive and flow-insensitive):
Knowing if a variable depends on an independent input and if some
output depends on it is fundamental for the ability to generate
efficient adjoint code. A conservative estimate can be computed
statically (at compile-time). The granularity of the information
generated by the activity
analysis has a strong impact on the complexity of generating the
and running it. The effective exploitation of very detailed activity
during the code generation phase is highly desirable.
requisiteness (TBR + extensions):
Building on the available information about activity it can be decided
a variable must be stored before its is overwritten or not.
the memory requirements of adjoint code can be decreased considerably.
requisiteness (for various reversal modes and types of checkpoints):
Different program reversal strategies (see point 3 below) require the
storage and retrieval
of various types of checkpoints. Depending on the context the size of
these checkpoints can be decreased considerably by exploiting the
information generated by further data-flow analyses.
- Program reversal modes
- static (e.g., split,
joint, delayed joint, etc.):
A number of call graph reversal modes are static in the sense that they
fully determined at compile-time. Therefore, the mapping between the
nodes in the extended dynamic call tree (EDCT - subroutine call pattern
is realized at run-time) must be well-defined at compiler-time such
the corresponding code can be generated. Statically reversed codes have
the advantage that the native compiler can potentially do a good job
- dynamic (based on heuristic running on extended DCT)
Often it may be advantageous to build the EDCT explicitly at run-time
make the decision about the reversal mode dynamically. Thus, the
of the EDCT as well as potential profiling information about the
of executing the subroutines that are associated with the single EDCT
can be exploited by heuristics for optimizing the adjoint code.
- user-guided (via
Sometimes the user may know that some reversal strategy gives good
a given section of the code. The user should be able to convey this
to the adjoint compiler via domain-specific directives (see 5).
- Language-coverage and library
handling in adjoint code
- coverage for
FortranXX, C, C++:
OpenAD has been designed to handle various programming languages. Work
underway for FortranXX, C and C++. Further effort is needed to reach
the desired level of robustness for the existing front-ends. New
for other programming languages (MatLab, Java, ...) should be added as
required. It is important to realize that software engineering problems
a large fraction of the issues involved in the development of adjoint
compiler technology. Without adequate programming support such a
is bound to fail.
- language concepts
(e.g., array arithmetic, pointers and dynamic
memory allocation, polymorphism):
Many language concepts, in particular those found in object-oriented
have never been considered in the context of automatic adjoint code
We are aware of several hard theoretical and technical problems that
to be considered in this context. Without an answer to these open
the correctness of the adjoint code cannot be guaranteed.
- support libraries
(MPI, netcdf, etc.):
Modern numerical simulation programs make extensive use of support
(for parallelism, i/o, ...) that need to be handled correctly in the
- AD-specific directives and intrinsics
- iterative linear and
Certain properties of numerical code that can be exploited during the
source transformation are hard to determine automatically
at compile-time. Iterative linear solvers do not need to be
if the mathematical context is understood and mapped to the adjoint
A delayed differentiation of nonlinear solvers (as soon as the values
almost converged) is desirable both for stability and efficiency.
The fact that a given section of the code is self-adjoint is hard to
statically. The user may know about this property and should be able to
convey this information to the adjoint compiler that can exploit it for
the generation of more efficient and numerically stable adjoint code.
- intrinsics catalog:
Certain subroutines are quasi-standard implementations in a given
area. Highly efficient and robust derivative code may be available and
be used as part of the automatically generated adjoint. OpenAD provides
catalog that allows the user to define application-specific
routines by providing code for the evaluation of their derivative. This
code can be used by the adjoint compiler in a plug-and-play fashion to
generate improved adjoint code.