p4 is a library of routines designed to express a wide variety of parallel algorithms portably, efficiently and simply. The goal of portability requires it to use widely accepted models of computation rather than specific vendor implementations of those models. The goal of efficiency requires it to use models of computation relatively close to those provided by the machines themselves and their system software. And the goal of simplicity requires it to provide programmers with a relatively small number of concepts, while providing a rich enough set that they can express the algorithms they have designed.
These goals are not always consistent. In some cases, the inconsistency has been resolved in p4 by providing multiple ways to do things. (For example, p4 provides completely automatic buffer management, but if a programmer prefers to deal with it himself to avoid the overhead of an extra copy operation, then p4 provides the appropriate buffer-management routines.) In other cases, judgments have been made regarding the balancing of portability, efficiency, and simplicity considerations. In many situations, considerable complexity has been absorbed into p4 itself in order to provide simplicity and portability to the programmer.
The most distinguishing feature of p4 is its support for multiple models of parallel computation. For the shared-memory MIMD model, it provides the monitor paradigm [15] for coordinating access to shared data, and runs on ``true'' shared-memory computers such as the Sequent symmetry and Alliant FX/2800, as well as NUMA (non-uniform memory access) machines that provide a shared-memory computational model, like the BBN TC-2000 and Kendall Square KSR-1. For the distributed-memory MIMD model, it provides the ``usual'' typed message-passing functions and global operations, and supplies implementations on all the platforms that support this model, such as the Intel Touchstone Delta and TMC CM-5, shared-memory machines such as the Sequent Symmetry and Kendall Square KSR-1, and heterogeneous networks of workstations. It also provides for explicit management of clusters, in which both shared- and distributed-memory MIMD models are explicitly used at the same time. It provides no support for the SIMD computational model.
In the following sections, we describe p4 in detail. Section
outlines the history of that branch of portable parallel
programming research at Argonne that has given rise to p4, and explains
its relationship with other systems. Section
outlines
the basic functions in the library and describes some more advanced features
as well. Section
describes the implementation,
including some interesting aspects of p4 not visible to the user.
Section
describes a representative sample of p4
applications that illustrates some of the uses to which p4 has been put
by its users. Finally, we mention some related projects and enhancements,
largely done by others, that have added useful features to p4, and
conclude with some reflections and future plans.