In this example, the computation is split into two parts: the part that needs
data from the other processors, and the part that doesn't. The part that is
independent of the other processors represents the interior of the
domain (relative to each processor); the part that requires data from other
processors is the boundary. The communication is started, the
computation on the interior takes place, followed by completing the
communication and finaly performing the computation on the boundary.
This version calls MPI_Isend before MPI_Irecv. This should be slightly better
for systems that use a sender-push rendezvous strategy.