As noted in the preceding section, MPICH collective operations are implemented on top of MPICH point-to-point operations. MPICH collective operations retrieve the hidden communicator from the communicator passed in the argument list and then use standard MPI point-to-point calls with this hidden communicator. We use straightforward ``power-of-two''-based algorithms to provide scalability; however, considerable opportunities for further optimization remain.
Although the basic implementation of MPICH collective operations uses
point-to-point operations, special versions of MPICH
collective operations exist. These special versions include both
vendor-supplied and shared-memory versions. In order to allow the use
of these special versions on a communicator-by-communicator basis,
each communicator contains a list of function pointers that point to
the functions that implement the collectives for that particular
communicator. Each communicator structure contains
a reference count so that communicators can share the same
list of pointers.
typedef struct MPIR_COLLOPS {
int (*Barrier) (MPI_Comm comm );
int (*Bcast) (void* buffer, int count, MPI_Datatype datatype,
int root, MPI_Comm comm );
... other function pointers ...
int ref_count; /* So we can share it */
} MPIR_COLLOPS;
Each MPI collective operation checks the validity of the input arguments, then
forwards the function arguments to the dereferenced function for the
particular communicator. This approach allows vendors to substitute
system-specific implementations for all or some of the collective routines.
Currently, Meiko, Intel, and Convex have provided vendor-specific collective
implementations. These implementations follow system-specific strategies; for
example, the Convex SPP collective routines makes use both of shared memory
and of the memory hierarchies in the SPP.