I have two libraries which solves SLAE on distributed memory (using MPI). One of which I am trying to implement. And there is an application that simulates thermal conductivity on a certain grid. The mesh is divided into equal parts. When modeling, it is necessary to solve a SLE with a sparse matrix, which is stored in a compressed form of the CSR. SLE is solved by the conjugate gradient method.
I am writing in C ++. And I noticed the following.
-
When SLE are solved via application + my library_1, then some MPI processes very slowly calculate local sums for scalar products. And the rest of the MPI processes count them equally quickly. I see unbalanced calculations on balanced data.
-
When SLE are solved through application + library_2, then all MPI processes count equally quickly and in a balanced manner.
-
I upload this SLE to a file. Then I wrote a small program that only reads the given SLE from the file and solves it using my library_1. Then those MPI processes that previously slowly counted local sums begin to count as quickly as other MPI processes. That is, all calculations are balanced.
-
A small program that only reads the given SLE from the file and solves it using library_2. All MPI processes count equally quickly and in a balanced manner. Noo is only twice as fast.
What's happening? Does it have something to do with memory? Who faced this?
Aucun commentaire:
Enregistrer un commentaire