next up previous
Next: Conclusion Up: PADL Previous: PADL

PADL Example: Full Configuration Interaction

In electronic structure computations, the full configuration interaction (FCI) computation provides the exact solution of the electronic Schrödinger equation within the initial algebraic approximation of a finite one-particle basis set. It thus confers the ability to adjudicate among various approximate methods (SCF, many-body methods, and truncated CI) and, by comparison with experiment, permits assessment of deficiencies in the one-particle basis set and the Hamiltonian approximation.


  
Figure: Early in the FCI Computation
9#9

The algorithm ``wants'' to run on a shared-memory machine. This is managed on a distributed-memory machine like the Intel Delta, which was used in this particular case, by giving each processor two roles. One role is to compute, and the other is to serve a data manager for the ``shared'' data stored on that processor (its part of a large vector, which is ``wrapped around'' the machine. When a process needs data, it computes which node has the data it needs and sends a message that briefly interrupts the computation on that node while the data manager (specifically, the interrupt handler) sends the desired data back to the requester. What made this algorithm interesting is that there is no convenient way to compute the pattern of communication that will result from all these data requests. There is a real possibility that bottlenecks will occur that mean severe under-utilization of the machine while many of the processes are idle, waiting for data.

PADL was used to study this problem, and a snapshot is shown in Figure [*] The run was done on 128 processors of the Delta. Arrows show requests for data that leave their originators suspended until the requests can be satisfied. At the beginning of the run, the algorithm is proceeding the way it is supposed to, with little contention for data. After a while one's worst fears are realized, as most of the machine is idle while large numbers of processes contend for data. The display shown in Figure [*], which shows the problem just beginning to manifest itself, contains a hint of how to solve the problem: one can see that the contention is for data concentrated on the first few processors, and this data can easily be separated and separately distributed, producing less contention. Actually, the problem clears itself before too long, and only occurs sporadically after that.


next up previous
Next: Conclusion Up: PADL Previous: PADL
Karen D. Toonen
1998-11-19