I have a parallel code, but I don't understand if it works correctly in parallel. I have two vectors A and B whose elements are matrices defined with a proper class. Since the matrices in the vectors are not primitive type I can't send these vectors to other ranks through MPI_Scatter, so I have to use MPI_Send and MPI_Recv. Also, rank 0 has only a coordination role: it sends to the other ranks the blocks they should work with and collects the results at the end, but it does not participate to the computation.
The solution of the exercise is the following:
// rank 0 sends the blocks to the other ranks, which compute the local
// block products, then receive the partial results and prints the global
// vector
if (rank == 0)
{
// send data
for (unsigned j = 0; j < N_blocks; ++j) {
int dest = j / local_N_blocks + 1;
// send number of rows
unsigned n = A[j].rows();
MPI_Send(&n, 1, MPI_UNSIGNED, dest, 1, MPI_COMM_WORLD);
// send blocks
MPI_Send(A[j].data(), n*n, MPI_DOUBLE, dest, 2, MPI_COMM_WORLD);
MPI_Send(B[j].data(), n*n, MPI_DOUBLE, dest, 3, MPI_COMM_WORLD);}
{"for loop for rank 0 to receive the results from others ranks"}
// all the other ranks receive the blocks and compute the local block
// products, then send the results to rank 0
else
{
// local vector
std::vector<dense_matrix> local_C(local_N_blocks);
// receive data and compute products
for (unsigned j = 0; j < local_N_blocks; ++j) {
// receive number of rows
unsigned n;
MPI_Recv(&n, 1, MPI_UNSIGNED, 0, 1, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
// initialize blocks
dense_matrix local_A(n,n); dense_matrix local_B(n,n);
// receive blocks
MPI_Recv(local_A.data(), n*n, MPI_DOUBLE, 0, 2, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
MPI_Recv(local_B.data(), n*n, MPI_DOUBLE, 0, 3, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
// compute product
local_C[j] = local_A * local_B; }
{"for loop for ranks != 0 to send the results to rank 0"}
}
In my opinion, if local_N_blocks= N_blocks / (size - 1);
is different from 1, the variable dest
doesn't change value at every loop iteration. So, after the first iteration of the "sending loop", the second time that rank 0 faces
MPI_Send(A[j].data(), n*n, MPI_DOUBLE, dest, 2, MPI_COMM_WORLD);
MPI_Send(B[j].data(), n*n, MPI_DOUBLE, dest, 3, MPI_COMM_WORLD);
it has to wait that the operation local_C[j] = local_A * local_B
of the previous j has been completed so the code doesn't seem to me well parallelized. What do you think?
Aucun commentaire:
Enregistrer un commentaire