I'm trying my hand at some multithreading in cpp11 and cannot figure out why in the following code, the serial version is much faster than the parallel one.
I understand that in this minimal example, the compute function is not worth parallelizing, but I'd like to use a similar approach to parallelize pixel rendering in a RayTracing algorithm, in which the compute takes much longer, but I get the same difference in duration in that other case.
I guess I'm missing something about threads. Any help or guidance would be much appreciated.
#include <iostream>
#include <thread>
#include <vector>
#include <chrono>
void compute(double& res)
{
res = 2*res;
}
void computeSerial(std::vector<double>& res, const size_t& nPoints)
{
for (size_t i = 0; i < nPoints; i++)
{
compute(res[i]);
}
}
void computeParallel(std::vector<double>& res, const size_t& nPoints)
{
int numThreads = std::thread::hardware_concurrency() - 1;
std::vector<std::thread*> pool(numThreads, nullptr);
size_t nPointsComputed = 0;
while(nPointsComputed < nPoints)
{
size_t firstIndex = nPointsComputed;
for (size_t i = 0; i < numThreads; i++)
{
size_t index = firstIndex + i;
if(index < nPoints)
{
pool[i] = new std::thread(compute, std::ref(res[index]));
}
}
for (size_t i = 0; i < numThreads; i++)
{
size_t index = firstIndex + i;
if(index < nPoints)
{
pool[i]->join();
delete pool[i];
}
}
nPointsComputed += numThreads;
}
}
int main(void)
{
size_t pbSize = 1000;
std::vector<double> vSerial(pbSize, 0);
std::vector<double> vParallel(pbSize, 0);
for (size_t i = 0; i < pbSize; i++)
{
vSerial[i] = i;
vParallel[i] = i;
}
int numThreads = std::thread::hardware_concurrency();
std::cout << "Number of threads: " << numThreads << std::endl;
std::chrono::steady_clock::time_point begin, end;
begin = std::chrono::steady_clock::now();
computeSerial(vSerial, pbSize);
end = std::chrono::steady_clock::now();
std::cout << "duration serial = " << std::chrono::duration_cast<std::chrono::nanoseconds>(end - begin).count() << "[ns]" << std::endl;
begin = std::chrono::steady_clock::now();
computeParallel(vParallel, pbSize);
end = std::chrono::steady_clock::now();
std::cout << "duration parallel = " << std::chrono::duration_cast<std::chrono::nanoseconds>(end - begin).count() << "[ns]" << std::endl;
return 0;
}
Afer compilation with clang++ -pthread main.cc
I get the following output:
Number of threads: 6
duration serial = 23561[µs]
duration parallel = 12219928[µs]
The serial version is consistantly much faster than the parallel one, no matter the number of doubles to compute.
Aucun commentaire:
Enregistrer un commentaire