TridiagLU - Parallel Direct Solver for Tridiagonal Systems

This code is a work-in-progress. The Git commit messages in the repository show the latest progress like bugfixes, addition of features, etc.

Download

With Bitbucket account and SSH keys set up:

git clone git@bitbucket.org:dog5011/tridiaglu.git

Otherwise:

git clone https://bitbucket.org/dog5011/tridiaglu.git

Overview

This is my implementation of a direct solver (LU decomposition) for non-periodic tridiagonal systems of equations in serial and parallel (MPI). It can solve a single system as well as a stack of independent systems. The latter is useful, for example, for compact finite-difference schemes applied to multi-dimensional problems. The following functions are available:-

  • tridiagLU(a,b,c,x,n,ns,r,m) (Parallel tridiagonal solver)
  • tridiagLUGS(a,b,c,x,n,ns,r,m) (Tridiagonal solver based on a gather-and-solve strategy)
  • tridiagIterJacobi(a,b,c,x,n,ns,r,m) (Iterative tridiagonal solver using the Jacobi method)
  • A test code is provided in /src/Test to check the tridiagonal solvers and test the walltimes. It runs a series of tests - solution of an identity matrix, upper-triangular tridiagonal matrix and a full tridiagonal matrix with random entries and right-hand sides to check for accuracy. It measures the walltimes by solving a set of systems repeatedly.

    Algorithm / References

    tridiagLU() solves the parallel tridiagonal system by rearranging the equations such that the last points of each sub-domain are grouped at the end. This results in a parallel elimination of the interior points, followed by the inversion of a reduced system of size nproc-1, whose elements are distributed as one element in each process except the root. This reduced system can either be solved by gathering on one processor, solving (in serial) and scattering back - tridiagLUGS(), or iteratively by tridiagIterJacobi(). Dr. Paul Fischer explained the algorithm and guided me in implementing this. Here is a rough sketch of the algorithm.

    tridiagLUGS() Each of the "ns" systems is gathered on one processor, solved in serial, and the solution scattered back. The parallelism lies in solving the "ns" different systems on multiple processors (i.e., each processor solves ~ns/nproc number of systems in serial).

    tridiagIterJacobi() uses Jacobi iterations to solve the system. See here.

    Scalability

    There are some scalability plots available here.

    Function Arguments

  • a (Size: [0,ns-1]x[0,n-1], Type: double*) - subdiagonal entries
  • b (Size: [0,ns-1]x[0,n-1], Type: double*) - diagonal entries
  • c (Size: [0,ns-1]x[0,n-1], Type: double*) - superdiagonal entries
  • x (Size: [0,ns-1]x[0,n-1], Type: double*) - right-hand side (solution)
  • n (Type: int) - local size of the system
  • ns (Type: int) - number of systems to solve
  • r (Type: TridiagLU*) - structure containing the parameters and runtimes (see README or include/tridiagLU.h for explanation)
  • m (Type: MPIContext*) - MPI Communicator (NULL for serial)
  • Return value (int): 0 (successful solve), -1 (singular system)

    Notes:-

  • x contains the final solution (right-hand side is replaced)
  • a,b,c are not preserved
  • On rank=0, a[0] has to be zero.
  • On rank=nproc-1, c[n-1] has to be zero.
  • Compiling

    The following steps should compile the code (the default prefix is the source folder itself, so the binary will be installed in tridiaglu/bin):

    autoreconf -i
    This will generate the required files for:
    [CFLAGS="..."] ./configure [--with-mpidir=/path/to/mpi] [--prefix=/install/dir]
    make
    make install

    ** A serial version can be compiled using the "-Dserial" compile flag.
    ** If MPI is unavailable or the configure script is unable to detect MPI, then a serial version of the code will get compiled.

    Testing

    Compiling the code results in compiling the tridiagonal solver functions as well as a code to test their performance and walltimes. The installed executable runs this test code.

    [/path/to/mpi/bin/]mpiexec -n $NPROC /path/to/TRIDIAGLU

    The run directory must contain a file "input" containing three integers: size of the system, number of systems and number of solves (for walltime measurements).

    The errors reported should be machine-zero.

    Using

    The test code demonstrates how to call these functions, including setting up the arrays a, b, c and x as well as the other arguments. To use these functions in another code, do either of the following:

    Copy the header file include/tridiagLU.h to the include folder of the code calling these functions. Copy all the source files in src/TridiagLU/ to the the code's src directory and include them while compiling.

    *OR*

    Place include/tridiagLU.h where the compiler can find it (or use the compiler flag "-I/path/to/tridiagLU.h" while compiling) and include $build_dir/src/TridiagLU/libTridiagLU.a while linking.