I am wondering as to the implementation of std::sort
in c++11
. I have an MPI
-managed parallel code, where each rank reads data from a file into a vector A
that needs to be sorted. Each rank does calls std::sort
to do this.
When I run this with ~100 ranks, there is sometimes one rank which hangs at this call to std::sort
. Eventually, I realized, it's not hanging, the sort just takes very long. That is, one rank will take ~200 times longer to sort than all of the others.
At first I suspected it was a load-balancing issue. Nope, I've checked thoroughly that the size of A
per rank is as balanced as possible.
I've concluded that it may just simply be that one rank has an initial condition of A
such that something like the worst-case performance of quicksort is realized (or at least a non-ideal-case).
Why do I think this?
- If I change the
MPI
configuration (thereby perturbing the content ofA
per rank, since it comes from a file read), the issue disappears, or it can move to other ranks. - If I change
std::sort
tostd::stable_sort
(no longer using the quicksort algorithm), then all is fine.
However, it seems that it would be most sensible to implement a quicksort by choosing a random pivot point on each iteration. If that were the case with std::sort
, then it would be overwhelmingly unlikely to choose a worst-case value randomly from A
on many iterations (which would be required to result in a 200x performance hit).
Thus, my observations suggest that std::sort
implements a fixed quicksort pivot value (e.g. always choose the first value in the array, or something like that). This is the only way that the behavior I'm seeing would be likely, and also give consistent results when re-running on the same MPI
configuration (which it does).
Am I correct in that conclusion? I did manage to find the std
source, but the sort
function is totally unreadable, and makes a plethora of calls to various helper functions, and I'd rather avoid a rabbit hole. Aside from that, I'm running on an HPC system, and it's not even clear to me how to be sure what exactly mpicxx
is linking to. I can't find any documentation which describe the algorithm implementation
Aucun commentaire:
Enregistrer un commentaire