Using MPI and Using Advanced MPI

These two books, published in 2014, show how to use MPI, the Message Passing Interface, to write parallel programs. Using MPI, now in its 3rd edition, provides an introduction to using MPI, including examples of the parallel computing code needed for simulations of partial differential equations and n-body problems. Using Advanced MPI covers additional features of MPI, including parallel I/O, one-sided or remote memory access communcication, and using threads and shared memory from MPI.

What is MPI?

MPI, the Message-Passing Interface, is an application programmer interface (API) for programming parallel computers. It was first released in 1992 and transformed scientific parallel computing. Today, MPI is widely using on everything from laptops (where it makes it easy to develop and debug) to the world's largest and fastest computers. Among the reasons for the the success of MPI is its focus on performance, scalability, and support for building tools and libraries that extend the power of MPI.

	Series Foreword	xiii
	Preface to the Third Edition	xv
	Preface to the Second Edition	xix
	Preface to the First Edition	xxi
1	Background	1
1.1	Why Parallel Computing?	1
1.2	Obstacles to Progress	2
1.3	Why Message Passing?	3
1.3.1	Parallel Computational Models	3
1.3.2	Advantages of the Message-Passing Model	9
1.4	Evolution of Message-Passing Systems	10
1.5	The MPI Forum	11
2	Introduction to MPI	13
2.1	Goal	13
2.2	What Is MPI?	13
2.3	Basic MPI Concepts	14
2.4	Other Interesting Features of MPI	18
2.5	Is MPI Large or Small?	20
2.6	Decisions Left to the Implementor	21
3	Using MPI in Simple Programs	23
3.1	A First MPI Program	23
3.2	Running Your First MPI Program	28
3.3	A First MPI Program in C	29
3.4	Using MPI from Other Languages	29
3.5	Timing MPI Programs	31
3.6	A Self-Scheduling Example: Matrix-Vector Multiplication	32
3.7	Studying Parallel Performance	38
3.7.1	Elementary Scalability Calculations	39
3.7.2	Gathering Data on Program Execution	41
3.7.3	Instrumenting a Parallel Program with MPE Logging	42
3.7.4	Events and States	43
3.7.5	Instrumenting the Matrix-Matrix Multiply Program	43
3.7.6	Notes on Implementation of Logging	47
3.7.7	Graphical Display of Logfiles	48
3.8	Using Communicators	49
3.9	Another Way of Forming New Communicators	55
3.10	A Handy Graphics Library for Parallel Programs	57
3.11	Common Errors and Misunderstandings	60
3.12	Summary of a Simple Subset of MPI	62
3.13	Application: Computational Fluid Dynamics	62
3.13.1	Parallel Formulation	63
3.13.2	Parallel Implementation	65
4	Intermediate MPI	69
4.1	The Poisson Problem	70
4.2	Topologies	73
4.3	A Code for the Poisson Problem	81
4.4	Using Nonblocking Communications	91
4.5	Synchronous Sends and “Safe” Programs	94
4.6	More on Scalability	95
4.7	Jacobi with a 2-D Decomposition	98
4.8	An MPI Derived Datatype	100
4.9	Overlapping Communication and Computation	101
4.10	More on Timing Programs	105
4.11	Three Dimensions	106
4.12	Common Errors and Misunderstandings	107
4.13	Application: Nek5000/NekCEM	108
5	Fun with Datatypes	113
5.1	MPI Datatypes	113
5.1.1	Basic Datatypes and Concepts	113
5.1.2	Derived Datatypes	116
5.1.3	Understanding Extents	118
5.2	The N-Body Problem	119
5.2.1	Gather	120
5.2.2	Nonblocking Pipeline	124
5.2.3	Moving Particles between Processes	127
5.2.4	Sending Dynamically Allocated Data	132
5.2.5	User-Controlled Data Packing	134
5.3	Visualizing the Mandelbrot Set	136
5.4	Gaps in Datatypes	146
5.5	More on Datatypes for Structures	148
5.6	Deprecated and Removed Functions	149
5.7	Common Errors and Misunderstandings	150
5.8	Application: Cosmological Large-Scale Structure Formation	152
5.3.1	Sending Arrays of Structures	144
6	Parallel Libraries	155
6.1	Motivation	155
6.1.1	The Need for Parallel Libraries	155
6.1.2	Common Deficiencies of Early Message-Passing Systems	156
6.1.3	Review of MPI Features That Support Libraries	158
6.2	A First MPI Library	161
6.3	Linear Algebra on Grids	170
6.3.1	Mappings and Logical Grids	170
6.3.2	Vectors and Matrices	175
6.3.3	Components of a Parallel Library	177
6.4	The LINPACK Benchmark in MPI	179
6.5	Strategies for Library Building	183
6.6	Examples of Libraries	184
6.7	Application: Nuclear Green’s Function Monte Carlo	185
7	Other Features of MPI	189
7.1	Working with Global Data	189
7.1.1	Shared Memory, Global Data, and Distributed Memory	189
7.1.2	A Counter Example	190
7.1.3	The Shared Counter Using Polling Instead of an Extra Process	193
7.1.4	Fairness in Message Passing	196
7.1.5	Exploiting Request-Response Message Patterns	198
7.2	Advanced Collective Operations	201
7.2.1	Data Movement	201
7.2.2	Collective Computation	201
7.2.3	Common Errors and Misunderstandings	206
7.3	Intercommunicators	208
7.4	Heterogeneous Computing	216
7.5	Hybrid Programming with MPI and OpenMP	217
7.6	The MPI Profiling Interface	218
7.6.1	Finding Buffering Problems	221
7.6.2	Finding Load Imbalances	223
7.6.3	Mechanics of Using the Profiling Interface	223
7.7	Error Handling	226
7.7.1	Error Handlers	226
7.7.2	Example of Error Handling	229
7.7.3	User-Defined Error Handlers	229
7.7.4	Terminating MPI Programs	232
7.7.5	Common Errors and Misunderstandings	232
7.8	The MPI Environment	234
7.8.1	Processor Name	236
7.8.2	Is MPI Initialized?	236
7.9	Determining the Version of MPI	237
7.10	Other Functions in MPI	239
7.11	Application: No-Core Configuration Interaction Calculations in Nuclear Physics	240
8	Understanding How MPI Implementations Work	245
8.1	Introduction	245
8.1.1	Sending Data	245
8.1.2	Receiving Data	246
8.1.3	Rendezvous Protocol	246
8.1.4	Matching Protocols to MPI’s Send Modes	247
8.1.5	Performance Implications	248
8.1.6	Alternative MPI Implementation Strategies	249
8.1.7	Tuning MPI Implementations	249
8.2	How Difficult Is MPI to Implement?	249
8.3	Device Capabilities and the MPI Library Definition	250
8.4	Reliability of Data Transfer	251
9	Comparing MPI with Sockets	253
9.1	Process Startup and Shutdown	255
9.2	Handling Faults	257
10	Wait! There’s More!	259
10.1	Beyond MPI-1	259
10.2	Using Advanced MPI	260
10.3	Will There Be an MPI-4?	261
10.4	Beyond Message Passing Altogether	261
10.5	Final Words	262
	Glossary of Selected Terms	263
A	The MPE Multiprocessing Environment	273
	A.1 MPE Logging	273
	A.2 MPE Graphics	275
	A.3 MPE Helpers	276
B	MPI Resources Online	279
C	Language Details	281
	C.1 Arrays in C and Fortran	281
	C.1.1 Column and Row Major Ordering	281
	C.1.2 Meshes vs. Matrices	281
	C.1.3 Higher Dimensional Arrays	282
	C.2 Aliasing	285
	References	287
	Subject Index	301
	Function and Term Index	305

	Series Foreword	xv
	Foreword	xvii
	Preface	xix
1	Introduction	1
1.1	MPI-1 and MPI-2	1
1.2	MPI-3	2
1.3	Parallelism and MPI	3
1.3.1	Conway’s Game of Life	4
1.3.2	Poisson Solver	5
1.4	Passing Hints to the MPI Implementation with MPI_Info	11
1.4.1	Motivation, Description, and Rationale	12
1.4.2	An Example from Parallel I/O	12
1.5	Organization of This Book	13
2	Working with Large-Scale Systems	15
2.1	Nonblocking Collectives	16
2.1.1	Example: 2-D FFT	16
2.1.2	Example: Five-Point Stencil	19
2.1.3	Matching, Completion, and Progression	20
2.1.4	Restrictions	22
2.1.5	Collective Software Pipelining	23
2.1.6	A Nonblocking Barrier?	27
2.1.7	Nonblocking Allreduce and Krylov Methods	30
2.2	Distributed Graph Topologies	31
2.2.1	Example: The Peterson Graph	37
2.2.2	Edge Weights	37
2.2.3	Graph Topology Info Argument	39
2.2.4	Process Reordering	39
2.3	Collective Operations on Process Topologies	40
2.3.1	Neighborhood Collectives	41
2.3.2	Vector Neighborhood Collectives	44
2.3.3	Nonblocking Neighborhood Collectives	45
2.4	Advanced Communicator Creation	48
2.4.1	Nonblocking Communicator Duplication	48
2.4.2	Noncollective Communicator Creation	50
3	Introduction to Remote Memory Operations	55
3.1	Introduction	57
3.2	Contrast with Message Passing	59
3.3	Memory Windows	62
3.3.1	Hints on Choosing Window Parameters	64
3.3.2	Relationship to Other Approaches	65
3.4	Moving Data	65
3.4.1	Reasons for Using Displacement Units	69
3.4.2	Cautions in Using Displacement Units	70
3.4.3	Displacement Sizes in Fortran	71
3.5	Completing RMA Data Transfers	71
3.6	Examples of RMA Operations	73
3.6.1	Mesh Ghost Cell Communication	74
3.6.2	Combining Communication and Computation	84
3.7	Pitfalls in Accessing Memory	88
3.7.1	Atomicity of Memory Operations	89
3.7.2	Memory Coherency	90
3.7.3	Some Simple Rules for RMA	91
3.7.4	Overlapping Windows	93
3.7.5	Compiler Optimizations	93
3.8	Performance Tuning for RMA Operations	95
3.8.1	Options for MPI_Win_create	95
3.8.2	Options for MPI_Win_fence	97
4	Advanced Remote Memory Access	101
4.1	Passive Target Synchronization	101
4.2	Implementing Blocking, Independent RMA Operations	102
4.3	Allocating Memory for MPI Windows	104
4.3.1	Using MPI_Alloc_mem and MPI_Win_allocate from C	104
4.3.2	Using MPI_Alloc_mem and MPI_Win_allocate from Fortran 2008	105
4.3.3	Using MPI_ALLOC_MEM and MPI_WIN_ALLOCATE from Older Fortran	107
4.4	Another Version of NXTVAL	108
4.4.1	The Nonblocking Lock	110
4.4.2	NXTVAL with MPI_Fetch_and_op	110
4.4.3	Window Attributes	112
4.5	An RMA Mutex	115
4.6	Global Arrays	120
4.6.1	Create and Free	122
4.6.2	Put and Get	124
4.6.3	Accumulate	127
4.6.4	The Rest of Global Arrays	128
4.7	A Better Mutex	130
4.8	Managing a Distributed Data Structure	131
4.8.1	A Shared-Memory Distributed List Implementation	132
4.8.2	An MPI Implementation of a Distributed List	135
4.8.3	Inserting into a Distributed List	140
4.8.4	An MPI Implementation of a Dynamic Distributed List	143
4.8.5	Comments on More Concurrent List Implementations	145
4.9	Compiler Optimization and Passive Targets	148
4.10	MPI RMA Memory Models	149
4.11	Scalable Synchronization	152
4.11.1	Exposure and Access Epochs	152
4.11.2	The Ghost-Point Exchange Revisited	153
4.11.3	Performance Optimizations for Scalable Synchronization	155
4.12	Summary	156
5	Using Shared Memory with MPI	157
5.1	Using MPI Shared Memory	159
5.1.1	Shared On-Node Data Structures	159
5.1.2	Communication through Shared Memory	160
5.1.3	Reducing the Number of Subdomains	163
5.2	Allocating Shared Memory	163
5.3	Address Calculation	165
6	Hybrid Programming	169
6.1	Background	169
6.2	Thread Basics and Issues	170
6.2.1	Thread Safety	171
6.2.2	Performance Issues with Threads	172
6.2.3	Threads and Processes	173
6.3	MPI and Threads	173
6.4	Yet Another Version of NXTVAL	176
6.5	Nonblocking Version of MPI_Comm_accept	178
6.6	Hybrid Programming with MPI	179
6.7	MPI Message and Thread-Safe Probe	182
7	Parallel I/O	187
7.1	Introduction	187
7.2	Using MPI for Simple I/O	187
7.2.1	Using Individual File Pointers	187
7.2.2	Using Explicit Offsets	191
7.2.3	Writing to a File	194
7.3	Noncontiguous Accesses and Collective I/O	195
7.3.1	Noncontiguous Accesses	195
7.3.2	Collective I/O	199
7.4	Accessing Arrays Stored in Files	203
7.4.1	Distributed Arrays	204
7.4.2	A Word of Warning about Darray	206
7.4.3	Subarray Datatype Constructor	207
7.4.4	Local Array with Ghost Area	210
7.4.5	Irregularly Distributed Arrays	211
7.5	Nonblocking I/O and Split Collective I/O	215
7.6	Shared File Pointers	216
7.7	Passing Hints to the Implementation	219
7.8	Consistency Semantics	221
7.8.1	Simple Cases	224
7.8.2	Accessing a Common File Opened with MPI_COMM_WORLD	224
7.8.3	Accessing a Common File Opened with MPI_COMM_SELF	227
7.8.4	General Recommendation	228
7.9	File Interoperability	229
7.9.1	File Structure	229
7.9.2	File Data Representation	230
7.9.3	Use of Datatypes for Portability	231
7.9.4	User-Defined Data Representations	233
7.10	Achieving High I/O Performance with MPI	234
7.10.1	The Four “Levels” of Access	234
7.10.2	Performance Results	237
7.11	An Example Application	238
7.12	Summary	242
8	Coping with Large Data	243
8.1	MPI Support for Large Data	243
8.2	Using Derived Datatypes	243
8.3	Example	244
8.4	Limitations of This Approach	245
8.4.1	Collective Reduction Functions	245
8.4.2	Irregular Collectives	246
9	Support for Performance and Correctness Debugging	249
9.1	The Tools Interface	250
9.1.1	Control Variables	251
9.1.2	Performance Variables	257
9.2	Info, Assertions, and MPI Objects	263
9.3	Debugging and the MPIR Debugger Interface	267
9.4	Summary	269
10	Dynamic Process Management	271
10.1	Intercommunicators	271
10.2	Creating New MPI Processes	271
10.2.1	Parallel cp: A Simple System Utility	272
10.2.2	Matrix-Vector Multiplication Example	279
10.2.3	Intercommunicator Collective Operations	284
10.2.4	Intercommunicator Point-to-Point Communication	285
10.2.5	Finding the Number of Available Processes	285
10.2.6	Passing Command-Line Arguments to Spawned Programs	290
10.3	Connecting MPI Processes	291
10.3.1	Visualizing the Computation in an MPI Program	292
10.3.2	Accepting Connections from Other Programs	294
10.3.3	Comparison with Sockets	296
10.3.4	Moving Data between Groups of Processes	298
10.3.5	Name Publishing	299
10.4	Design of the MPI Dynamic Process Routines	302
10.4.1	Goals for MPI Dynamic Process Management	302
10.4.2	What MPI Did Not Standardize	303
11	Working with Modern Fortran	305
11.1	The mpi_f08 Module	305
11.2	Problems with the Fortran Interface	306
11.2.1	Choice Parameters in Fortran	307
11.2.2	Nonblocking Routines in Fortran	308
11.2.3	Array Sections	310
11.2.4	Trouble with LOGICAL	311
12	Features for Libraries	313
12.1	External Interface Functions	313
12.1.1	Decoding Datatypes	313
12.1.2	Generalized Requests	315
12.1.3	Adding New Error Codes and Classes	322
12.2	Mixed-Language Programming	324
12.3	Attribute Caching	327
12.4	Using Reduction Operations Locally	331
12.5	Error Handling	333
12.5.1	Error Handlers	333
12.5.2	Error Codes and Classes	335
12.6	Topics Not Covered in This Book	335
13	Conclusions	341
13.1	MPI Implementation Status	341
13.2	Future Versions of the MPI Standard	341
13.3	MPI at Exascale	342
	MPI Resources on the World Wide Web	343
	References	345
	Subject Index	353
	Function and Term Index	359

Using MPI and Using Advanced MPI

What is MPI?

Examples

Errata

News and Reviews

Tables of Contents