We have a template method that generically executes some member function in parallel, according to the number of system threads available, using a thread pool.
Two of the arguments this method must take are fixed, the first being the thread index and the second being the number of threads.
This is so these methods can do something like the following:
for(i = threadIndex; i < workContainer.size(); i += numThreads)
{
// So this thread will only do every numThreads piece of the work from the workContainer, with no overlap between threads
}
This is good, and here is what that method looks like:
template <class S, class Args>
void parallelWork(S* self, void (S::*workfcn)(const unsigned, const unsigned, Args*), Args* args)
{
auto work = [&](const unsigned threadIndex, const unsigned nthreads, S* self, Args* args)
{
std::string* rc = nullptr;
try
{
(self->*workfcn)(threadIndex, nthreads, args);
}
catch (std::exception& err)
{
rc = new std::string(err.what());
}
return rc;
};
ulong nthreads = size();
std::vector<std::string*> rc(nthreads);
if (1 == nthreads)
{
rc[0] = work(0, 1, self, args);
}
else
{
std::vector<std::future<std::string*>> frc(nthreads);
for (unsigned i = 0; nthreads > i; ++i)
frc[i] = enqueue(work, i, nthreads, self, args);
// Wait for all
for (unsigned i = 0; nthreads > i; ++i)
rc[i] = frc[i].get();
}
for (unsigned i = 0; nthreads > i; ++i)
{
if (rc[i])
{
HELAS_ERROR(std::string("Thread ") + std::to_string(i) + ": " + *rc[i]); delete rc[i];
}
}
}
However, this is limited. The current solution requires that we effectively put all of the arguments (some of which are references) into a structure, and then pass a pointer to that structure to this parallelWork()
method.
Wouldn't it be great, if instead we could directly pass the arguments the method should take? IE, instead of doing this:
struct ThreadArgs
{
const std::vector<int>& inArg1;
double inArg2;
SomeClass* inarg3;
std::vector<std::vector<double>> outArg1; // outer vector is per-thread
}
void MyClass::doFooInParallel()
{
std::vector<int> inArg1;
double inArg2;
SomeClass* cp = getSomeClass();
std::vector<std::vector<double>> outArg1(numThreads);
ThreadArgs args = ThreadArgs(inArg1, inArg2, cp, outArg1);
parallelWork(this, &MyClass::foo, &args);
}
We could do this:
void MyClass::doFooInParallel()
{
std::vector<int> inArg1;
double inArg2;
SomeClass* cp = getSomeClass();
std::vector<std::vector<double>> outArg1(numThreads);
parallelWork(this, &MyClass::foo, inArg1, inArg2, cp, outArg1);
}
It seems like a variadic template would be the solution, changing the start of the method to this:
template <class S, class... Args>
void parallelWork(S* self,void (S::*workfcn)(const unsigned, const unsigned, Args... args),Args... args)
{
auto work = [&](const unsigned threadIndex, const unsigned nthreads,S* self, Args... args)
And this does compile with our current usage of the method. However, there is a problem - these arguments are pass-by-value instead of pass-by-reference.
For inputs this may not be bad and some may be eliminated by copy elision. But for variables passed which are outputs of the method (like a vector nthreads in size with output from each thread), that obviously doesn't work - it will mean doing the work in parallel results in nothing being actually done.
I tried making them references like so:
template <class S, class... Args>
void parallelWork(S* self,void (S::*workfcn)(const unsigned, const unsigned, Args&&... args),Args&&... args)
{
auto work = [&](const unsigned threadIndex, const unsigned nthreads,S* self, Args&&... args)
However, this fails when we try to use it.
could not match type-parameter 0-1&& against (the actual parameter we tried to pass)
(Single-reference also failed, but I tested just in case)
What may be a solution to this? Is it possible in C++11, or only with features from later C++ standards?
Would std::forward
or std::ref
or std::cref
help at all here somehow?
This is really more of a conceptual question, so please forgive me that all of this code can't be run, it'd be an enormous amount of work to get a runnable example of all this up and I think the question is explained well enough to cover it, but let me know if it needs to be clarified.
Aucun commentaire:
Enregistrer un commentaire