next up previous
Next: MPI Up: Programming Packages and Tools Previous: Chameleon

Parallel Research on Invariant Subspace Methods

(Contributed by Christian Bischof and Xiaobai Sun)

Successive Band Reduction (SBR) is an approach for the orthogonally reduction of matrices to condensed form allowing for the use of matrix-matrix (BLAS-3) kernels. Special instances of SBR are the tridiagonal and Hessenberg reductions used in various eigensolvers. In addition, SBR supports general bandreduction, which is needed for banded eigenvalue scenarios. A modified SBR approach is also used in the PRISM (Parallel Research on Invariant Subspace Methods) project for the development of a scalable parallel eigenvalue solver.

The SBR code is intended for distributed-memory MIMD parallel machines. It uses the Chameleon programming system, and its support for parallel operations on disjoint nodes subsets is critical in exploiting the multiple levels of parallelism in the algorithm. The SBR code had initially been implemented on the DELTA.

Table [*] shows some early performance results of the SBR code when applied to the Hessenberg reduction of a full matrix. ``N'' is the matrix size, ``nb'' is the block size used in the reduction of the matrix to ``b'' subdiagonal bands, ``time'' is the elapsed time in seconds, and ``Gflops'' is the sustained double-precision performance in Gflops/sec on the 16-node IBM SP1 system. These runs used the EUIH transport layer and the vendor-supplied `blas.a' library.


 
Table: Results for SBR code applied to a full matrix
N nb b Time Gflops
1000 1 20 19.9 0.233
1020 15 15 11.5 0.412
2000 1 20 164.1 0.227
2000 10 10 63.1 0.582
3000 1 10 512.5 0.246
3000 15 15 165.4 0.798
4000 20 20 346.0 0.849
 

Note that codes exploiting matrix-matrix kernels (nb > 1) perform much better than codes based on matrix-vector kernels (nb = 1). In particular, using a blocked algorithm, we can reduce a 5#5 matrix to a bandwidth of 15 roughly in the same time that we can tridiagonalize a 6#6 matrix, even though the former operation takes three times as many floating-point operations.


next up previous
Next: MPI Up: Programming Packages and Tools Previous: Chameleon
Karen D. Toonen
1998-11-18