mardi 19 décembre 2017

C++ async vs OpenMP tasks

In OpenMP, I can create a bunch of tasks as follows and run them asynchronously using some fixed number of threads:

#pragma omp parallel
{
   #pragma omp single 
   {
      for (int i = 0; i < 1000; i++) {
         #pragma omp task
         f(i);
}  }  }

In C++11, I can do something not-quite-same std::async:

std::vector<std::future> futures;
for (int i = 0; i < 1000; i++) {
   auto fut = std::async(f, i);
   futures.push_back(std::move(fut));
}
...
for (auto & fut : futures) {
  auto res = fut.get();
  // do something with res
}

What I worry about is efficiency. If I am correct, in OpenMP, tasks are stored in some task pool and then distributed to threads (automatically by the OpenMP runtime).

In C++, at the moment of invoking std::async, the runtime decides whether to run f(i) asynchronously in a new thread or defer its run to the point of invoking std::future::get.

Consequently, either a runtime

  1. creates 1000 threads and run them all concurrently,
  2. or create less threads, but then some invocation of f(i) will be performed sequentially in the main thread (within the final loop).

Both these options seem to be generally less efficient than what OpenMP does (create many tasks and run them concurrently in a fixed number of threads).

Is there any way to get the same behavior as what OpenMP tasks provide with C++ threading?

Aucun commentaire:

Enregistrer un commentaire