Argonne National Laboratory

Effective Use of Dedicated Wide-Area Networks for High-Performance Distributed Computing

TitleEffective Use of Dedicated Wide-Area Networks for High-Performance Distributed Computing
Publication TypeReport
Year of Publication2004
AuthorsKaronis, NT, Papka, ME, Binns, J, Bresnahan, J, Link, JM
Date Published04/2004
Other NumbersANL/MCS-P1151-0404

Recent advances in Grid technology have made it possible to build so-called computational Grids, or simply Grids, which couple unique or rare resources that are geographically separated and span multiple administrative domains. Such Grids are invariably composed of heterogeneous networks in which, at the least, a high-performance switch accommodates intracluster messages and a separate, sometimes dedicated, high-bandwidth network serving intersite messages across the wide area. While such wide-area networks provide unprecedented bandwidth capacity and reliability, the effective utilization of these networks remains an open challenge. Most applications by default use the TCP/IP protocol for its ease of use and reliability, but the high bandwidth and high latency sometimes found on these networks induce enormous bandwidth delay products that result in extremely large TCP congestion window sizes. This situation makes TCP a poor choice for data-intensive applications striving to achieve maximum bandwidth utilization on high-performance networks. To address this bandwidth utilization challenge for Grids connected over dedicated networks, we present a solution based on the UDP protocol with added reliability and the Message Passing Interface (MPI) standard. MPI provides an interface that allows application programmers to ignore network heterogeneity. To study the efficacy of our approach, we implemented our implementation of the Reliable-Blast UDP protocol in MPICH-G2, our Grid-enabled MPI. We demonstrated this implementation in an MPI data-intensive Grid visualization application on the NSF TeraGrid and its dedicated high-bandwidth fiber optic network. We observed an improvement in aggregate bandwidth utilization from 58 Mbps with MPICH-G2 using TCP alone to 9 Gbps with our technique.