A High-Performance, Portable Implementation of the MPI Message Passing Interface Standard

A High-Performance, Portable Implementation of the MPI Message Passing Interface Standard

William Gropp
Ewing Lusk
Mathematics and Computer Science Division
Argonne National Laboratory and Nathan Doss
Anthony Skjellum
Department of Computer Science &
NSF Engineering Research Center for CFS
Mississippi State University

MPI (Message Passing Interface) is a specification for a standard library for message passing that was defined by the MPI Forum, a broadly based group of parallel computer vendors, library writers, and applications specialists. Multiple implementations of MPI have been developed. In this paper, we describe MPICH, unique among existing implementations in its design goal of combining portability with high performance. We document its portability and performance and describe the architecture by which these features are simultaneously achieved. We also discuss the set of tools that accompany the free distribution of MPICH, which constitute the beginnings of a portable parallel programming environment. A project of this scope inevitably imparts lessons about parallel computing, the specification being followed, the current hardware and software environment for parallel computing, and project management; we describe those we have learned. Finally, we discuss future developments for MPICH, including those necessary to accommodate extensions to the MPI Standard now being contemplated by the MPI Forum.


Contents
  • Introduction
  • Background
  • Precursor Systems
  • Brief Overview of MPI
  • Development History of MPICH
  • Related Work
  • Portability and Performance
  • Portability of MPICH
  • Exploiting High-Performance Switches
  • Exploiting Shared-Memory Architectures
  • Exploiting Networks of Workstations
  • Performance of MPICH
  • Performance Measurement Problems and Pitfalls
  • Benchmarks for Point-to-Point Operations
  • Performance of MPICH Compared with Native Vendor Systems
  • Paragon Measurements
  • IBM SP2 measurements
  • SGI Power Challenge Measurements
  • Cray T3D Measurements
  • Workstation Network Measurements
  • Architecture of MPICH
  • The Abstract Device Interface
  • The Channel Interface
  • A Case Study
  • Selected Subsystems
  • Groups
  • Communicators
  • Collective Operations
  • Attributes
  • Topologies
  • The Profiling Interface
  • The Fortran Interface
  • Job Startup
  • Building MPICH
  • Documentation
  • Toward a Portable Parallel Programming Environment
  • The MPE Extension Library
  • Command-Line Arguments and Standard I/O
  • Support for Performance Analysis and Debugging
  • Profiling Libraries
  • Upshot
  • Support for Adding New Profiling Libraries
  • Useful Commands
  • Network Management Tools
  • Example Programs
  • Software Management Techniques and Tools
  • Configuring for Different Systems
  • Source Code Management
  • Testing
  • Tracking and Responding to Problem Reports
  • Preparing a New Release
  • Lessons Learned
  • Language Bindings
  • Performance
  • Resource Limits
  • Heterogeneity and Interoperability
  • 64-bit Issues
  • Unresolved Issues
  • Status and Plans
  • Vendor Interactions
  • Other Users
  • Planned Enhancements
  • MPI-2
  • Summary
  • Acknowledgments
  • Bibliography