P. Balaji, H. Naik, and N. Desai, "Understanding Network Saturation Behavior on Large-Scale Blue Gene/P Systems," Proceedings of the 2009 15th International Conference on Parallel and Distributed Systems, IEEE Computer Society, 2009, pp. 586-593. Also Preprint ANL/MCS-P1671-0909, September 2009. [pdf]
As researchers continue to architect massive-scale systems, it is becoming clear that these systems will utilize a significant amount of shared hardware between processing units. Systems such as the IBM Blue Gene (BG) and Cray XT have started utilizing flat networks (a.k.a. scalable networks) which differ from switched fabrics in that they use a 3D torus or similar topology. This allows the network to only grow linearly with system scale, instead of the super-linear growth needed for full fat-tree switched topologies, but at the cost of increased network sharing between processing nodes. While in many cases a full fat-tree is an over-estimate of the needed bisectional bandwidth, it is not clear whether the other extreme of a flat topology is sufficient to move data around the network efficiently. Thus, In this paper, we study the network behavior of the IBMBG/P using several application communication kernels, and monitor network congestion behavior based on detailed hardware counters. Our studies scale from small systems to up to 8 racks (32768 cores) of BG/P, and
show various interesting insights into the network communication characteristics of the system.