vendredi 25 janvier 2019

C++ std::sort implementation

I am wondering as to the implementation of std::sort in c++11. I have an MPI-managed parallel code, where each rank reads data from a file into a vector A that needs to be sorted. Each rank does calls std::sort to do this.

When I run this with ~100 ranks, there is sometimes one rank which hangs at this call to std::sort. Eventually, I realized, it's not hanging, the sort just takes very long. That is, one rank will take ~200 times longer to sort than all of the others.

At first I suspected it was a load-balancing issue. Nope, I've checked thoroughly that the size of A per rank is as balanced as possible.

I've concluded that it may just simply be that one rank has an initial condition of A such that something like the worst-case performance of quicksort is realized (or at least a non-ideal-case).

Why do I think this?

  • If I change the MPI configuration (thereby perturbing the content of A per rank, since it comes from a file read), the issue disappears, or it can move to other ranks.
  • If I change std::sort to std::stable_sort (no longer using the quicksort algorithm), then all is fine.

However, it seems that it would be most sensible to implement a quicksort by choosing a random pivot point on each iteration. If that were the case with std::sort, then it would be overwhelmingly unlikely to choose a worst-case value randomly from A on many iterations (which would be required to result in a 200x performance hit).

Thus, my observations suggest that std::sort implements a fixed quicksort pivot value (e.g. always choose the first value in the array, or something like that). This is the only way that the behavior I'm seeing would be likely, and also give consistent results when re-running on the same MPI configuration (which it does).

Am I correct in that conclusion? I did manage to find the std source, but the sort function is totally unreadable, and makes a plethora of calls to various helper functions, and I'd rather avoid a rabbit hole. Aside from that, I'm running on an HPC system, and it's not even clear to me how to be sure what exactly mpicxx is linking to. I can't find any documentation which describe the algorithm implementation

Aucun commentaire:

Enregistrer un commentaire