Recall from Section 3.7.2 that a hypercube connects each of P tasks ( P a power of 2) to other tasks (Figure 3.16). The template considered in this chapter uses this communication structure in an SPMD fashion, with each task executing Algorithm 11.1. A local state variable is first set to be the supplied input data. Computation then proceeds in steps. In each step, each task first exchanges its local state with one of its neighbors in the hypercube and then combines the message received from the neighbor with state to generate a new state. The output of the computation is the state generated in the final step.
In Algorithm 11.1, the XOR function denotes an exclusive or operation and is used to identify neighbors. (Exclusive or is defined as follows: 0 XOR 0=0, 0 XOR 1=1, 1 XOR 0=1, 1 XOR 1=0.) As noted in Section 3.7.2, the hypercube has the property that the binary labels of two nodes that are neighbors in the d th dimension differ only in the d th place; hence, the expression myid XOR yields the i th neighbor of node myid.
A particular parallel algorithm is defined by the operator OP used to combine state and message at each step in the template. In the following, we shall show how this template can be used as a basis for parallel vector reduction, matrix transposition, and sorting algorithms.
© Copyright 1995 by Ian Foster