|Title||Reducing Communication in Parallel Breadth-First Search on Distributed Memory Systems |
|Publication Type||Report |
|Year of Publication||2014 |
|Authors||Lu, H, Tan, G, Chen, M, Sung, N |
|Other Numbers||ANL/MCS-P5226-1114 |
|Abstract||Breadth-first search (BFS) is a key operation in data-intensive graph analysis applications. However, for distributed BFS algorithm on large distributed memory systems, data communication often limits the scalability of the algorithm as it costs significantly more than arithmetic computation. In this work, we try to reduce the communication cost in distributed BFS by sieving and compressing the messages. First, we propose a novel distributed directory to sieve the redundant data in collective communications. Then we leverage a bitmap compression algorithm to further reduce the size of messages in communication. Experiments on a 6,144-core Intel Westmere based cluster show our algorithm achieve a BFS performance rate of 12.1 billion edge visits per second on an undirected graph of 8 billion vertices and 128 billion edges with scale-free distribution. Compared to the “replicated-csr” version BFS in Graph500, our algorithm reduces communication cost by 79.0% and gets a speedup of 2.2×.