OSPRI: An Optimized One-Sided Communication Runtime for Leadership-Class Machines
|Title||OSPRI: An Optimized One-Sided Communication Runtime for Leadership-Class Machines|
|Publication Type||Conference Paper|
|Year of Publication||2011|
|Authors||Hammond, JR, Dinan, J, Balaji, P, Kabadshow, I, Potluri, S, Tipparaju, V|
|Conference Name||The 6th Conference on Partitioned Global Address Space Programming Models|
|Conference Location||Santa Barbara, CA|
Partitioned Global Address Space (PGAS) programming models provide a convenient approach to implementing complex scientific applications by providing access to a large, globally accessible address space. Global Arrays (GA) is a popular PGAS model that is focused on providing an efficient, productive interface to distributed shared global arrays and is used by several important scientific computing applications including the NWChem computational chemistry suite. While the communication runtime of GA (named ARMCI) has been optimized for several platforms, its architecture was fundamentally designed for general purpose cluster computing systems with full-fledged Operating Systems. In the recent past the largest systems in the world have been increasingly moving towards custom lightweight Operating Systems that are more tightly coupled with the hardware architecture and its usage environment. For such platforms, however, communication runtime architectures such as ARMCI might not map optimally. In this work, we describe a new communication runtime for PGAS models such as GA, termed OSPRI (One-Sided PRImitives). OSPRI presents several changes in architecture from conventional one-sided communication systems that make it better suited for emerging leadership class machines. We describe the implementation of the the IBM Blue Gene/P target for OSPRI and demonstrate significant improvements in latency, bandwidth, and scalability over well tuned ARMCI and GA implementations on this system.