dimanche 8 mars 2020

Multithreading, why is this serial code faster than its parallel version?

I'm trying my hand at some multithreading in cpp11 and cannot figure out why in the following code, the serial version is much faster than the parallel one.

I understand that in this minimal example, the compute function is not worth parallelizing, but I'd like to use a similar approach to parallelize pixel rendering in a RayTracing algorithm, in which the compute takes much longer, but I get the same difference in duration in that other case.

I guess I'm missing something about threads. Any help or guidance would be much appreciated.

#include <iostream>
#include <thread>
#include <vector>
#include <chrono>

void compute(double& res)
{
    res = 2*res;
}

void computeSerial(std::vector<double>& res, const size_t& nPoints)
{
    for (size_t i = 0; i < nPoints; i++)
    {
        compute(res[i]);
    }
}

void computeParallel(std::vector<double>& res, const size_t& nPoints)
{
    int numThreads = std::thread::hardware_concurrency() - 1;
    std::vector<std::thread*> pool(numThreads, nullptr);
    size_t nPointsComputed = 0;
    while(nPointsComputed < nPoints)
    {
        size_t firstIndex = nPointsComputed;
        for (size_t i = 0; i < numThreads; i++)
        {
            size_t index = firstIndex + i;
            if(index < nPoints)
            {
                pool[i] = new std::thread(compute, std::ref(res[index]));
            }
        }
        for (size_t i = 0; i < numThreads; i++)
        {
            size_t index = firstIndex + i;
            if(index < nPoints)
            {
                pool[i]->join();
                delete pool[i];
            }
        }
        nPointsComputed += numThreads;
    }
}

int main(void)
{
    size_t pbSize = 1000;
    std::vector<double> vSerial(pbSize, 0);
    std::vector<double> vParallel(pbSize, 0);
    for (size_t i = 0; i < pbSize; i++)
    {
        vSerial[i] = i;
        vParallel[i] = i;
    }

    int numThreads = std::thread::hardware_concurrency();
    std::cout << "Number of threads: " << numThreads << std::endl;

    std::chrono::steady_clock::time_point begin, end;

    begin = std::chrono::steady_clock::now();
    computeSerial(vSerial, pbSize);
    end = std::chrono::steady_clock::now();
    std::cout << "duration serial   = " << std::chrono::duration_cast<std::chrono::nanoseconds>(end - begin).count() << "[ns]" << std::endl;

    begin = std::chrono::steady_clock::now();
    computeParallel(vParallel, pbSize);
    end = std::chrono::steady_clock::now();
    std::cout << "duration parallel = " << std::chrono::duration_cast<std::chrono::nanoseconds>(end - begin).count() << "[ns]" << std::endl;

    return 0;
}

Afer compilation with clang++ -pthread main.cc I get the following output:

Number of threads: 6
duration serial   = 23561[µs]
duration parallel = 12219928[µs]

The serial version is consistantly much faster than the parallel one, no matter the number of doubles to compute.

Aucun commentaire:

Enregistrer un commentaire