The reason for using bufsize = filesize/numprocs + 1 is for the case
where
filesize < numprocs. If the "+1" is left out, then all
processes read zero
elements. This way, the whole file will be read. The cost is
that if filesize is evenly divided by numprocs, a less than
optimal number of elements is read by each process.
Thanks to Chieh-Sen Huang <huangcs@math.nsysu.edu.tw>.
theFile.Read(buf, bufsize, MPI_INT, &status );should read
theFile.Read(buf, bufsize, MPI_INT, status );
over the communicator specified in its last argumentbut should read
over the communicator specified in its second-to-last argumentsince the MPI Window object is returned in the last argument.
Thanks to Brad Penoff.
Thanks to Jeff Squyres <squyres@cse.nd.edu>.
Thanks to Takao Hatazaki.
Thanks to Takao Hatazaki.
Thanks to Takao Hatazaki.
Thanks to Takao Hatazaki.
Thanks to Takao Hatazaki.
volatile int i; ...to indicate that i must also be declared volatile.
Thanks to Brian Toonen <toonen@mcs.anl.gov>.
volatile int i = 0;
int j = 0;
while (i < 10) {
lock();
if (i < 10) {
i = i + 1;
j = j + 1;
}
unlock();
}
printf( "j = %d\n", j );
The reason for the second test is that two thread could both test i <
10 when i is 9, and the (in the original code), both would
increment i. The revised code performs a quick test outside of the
lock; if the test is true, the thread acquires the lock and performs the
test again. If the test is now false, the thread releases the lock without
incrementing i; if the test is still true, then the thread increments
i.
Thanks to Brian Toonen <toonen@mcs.anl.gov>.
Thanks to Takao Hatazaki <Takao.Hatazaki@JP.COMPAQ.com>.
Thanks to Takao Hatazaki.
Thanks to Takao Hatazaki.
Thanks to Takao Hatazaki.
Thanks to Takao Hatazaki.
newtype = MPI::Datatype::Match_size( MPI::TYPECLASS_INTEGER,
sizeof(MPI::Aint )
should be
newtype = MPI::Datatype::Match_size( MPI::TYPECLASS_INTEGER,
sizeof(MPI::Aint ) );
subroutine exchng1( a, nx, s, e, win, &
bottom_nbr, top_nbr )
use mpi
integer nx, s, e
double precision a(0:nx+1,s-1:e+1)
integer win, bottom_nbr, top_nbr
integer ierr
call MPI_WIN_FENCE( 0, win, ierr )
! Put top edge into top neighbor's ghost cells
call MPI_PUT( a(1,e), nx, MPI_DOUBLE_PRECISION, &
top_nbr, 1, nx, MPI_DOUBLE_PRECISION, win, ierr )
! Get top edge from top neighbor's first column
call MPI_GET( a(1,e+1), nx, MPI_DOUBLE_PRECISION, &
top_nbr, nx + 3, nx, MPI_DOUBLE_PRECISION, win, ierr )
call MPI_WIN_FENCE( 0, win, ierr )
return
end
Thanks to Bo-Wen Shen <bwshen@hera.gsfc.nasa.gov> and Takao Hatazaki.
Instead of putting data into ghost cells only on remote processes, we can put data into the ghost cells of the process on the top, starting at a displacement of one, and we can get the ghost cells for our part of the grid on the bottom edge by getting grid data from the first column of the process on the bottom.to
Instead of putting data into ghost cells only on remote processes, we can put data into the ghost cells of the process on the top, starting at a displacement of one, and we can get the ghost cells for our part of the grid on the top edge by getting grid data from the first column of the process on the top.
Thanks to Takao Hatazaki.
Also note that there is no explicit reference to the left_nbr in the above code: the ``get from right neighbor'' replaces the ``put to left neighbor.''
(e.g., we must send nx+1 values starting from a(0,m) rather than nx values starting from a(1,m)).Thanks to Takao Hatazaki.
double precision a(sx-1:ex+1,sy-1:sy+1)should be
double precision a(sx-1:ex+1,sy-1:ey+1)Thanks to Takao Hatazaki.
Thanks to Takao Hatazaki.
do i=1,ny
buf(i) = a(1,i-sy+1)
enddo
call MPI_WIN_FENCE( 0, winbuf, ierr )
call MPI_PUT( buf, ny, MPI_DOUBLE_PRECISION, top_nbr, &
0, ny, MPI_DOUBLE_PRECISION, winbuf, ierr )
... similar code for the bottom edge
should be
do i=1,ny
buf(i) = a(1,sy+i-1)
enddo
call MPI_WIN_FENCE( 0, winbuf, ierr )
call MPI_PUT( buf, ny, MPI_DOUBLE_PRECISION, left_nbr, &
0, ny, MPI_DOUBLE_PRECISION, winbuf, ierr )
... similar code for the right edge
Thanks to Takao Hatazaki.
Thanks to Takao Hatazaki.
It would be better to move the data in t and immediately add it to s to form w.to
It would be better to move the data in t and immediately add it to the t for rank zero to form w on rank zero.
Thanks to Takao Hatazaki.
Thanks to Takao Hatazaki.
Thanks to Takao Hatazaki.
...(e.g., A = 0 for an array A in Fortran)
Thanks to Takao Hatazaki.
... less restrictive than writing to memory ...
Thanks to Brad Penoff.
Thanks to Takao Hatazaki.
Thanks to Takao Hatazaki.
Thanks to Takao Hatazaki.
Thanks to Takao Hatazaki.
Thanks to Takao Hatazaki.
Thus, to compute the sum, we need only add up the contributions from the sibling of the node, the sibling of its parent, the sibling of its grandparent, and the sibling of grandparent's parent.The original text had confusing use of plurals.
Thanks to Takao Hatazaki.
/* Get the largest power of two smaller than size */
mask = 1;
while (mask < size) mask <<= 1;
mask >>= 1;
level = 0;
idx = 0;
while (mask >= 1) {
if (rank < mask) {
/* go to left for acc_idx, go to right for
get_idx. set idx=acc_idx for next iteration */
acc_idx[level] = idx + 1;
get_idx[level] = idx + mask*2;
idx = idx + 1;
}
else {
/* go to right for acc_idx, go to left for
get_idx. set idx=acc_idx for next iteration */
acc_idx[level] = idx + mask*2;
get_idx[level] = idx + 1;
idx = idx + mask*2;
}
level++;
rank = rank % mask;
mask >>= 1;
}
Thanks to Rajeev Thakur.
Thanks to Takao Hatazaki.
Thanks to Takao Hatazaki.
Thanks to Brian Toonen <toonen@mcs.anl.gov>.
Thanks to Brian Toonen <toonen@mcs.anl.gov>.
Thanks to Brian Toonen <toonen@mcs.anl.gov>.
Thanks to Brian Toonen <toonen@mcs.anl.gov>.
Thanks to Takao Hatazaki.
Thanks to Takao Hatazaki.
/* Any process can call this to fetch and increment by value */
void counter_nxtval( MPI_Comm counter_comm, int incr, int *value )
{
MPI_Send(&incr, 1, MPI_INT, 0, 0, counter_comm);
MPI_Recv(value, 1, MPI_INT, 0, 0, counter_comm, MPI_STATUS_IGNORE);
}
(the arguments to MPI_Send were wrong).
Thanks to Takao Hatazaki.