mercredi 15 avril 2015

Threads seem to be slowing down image processing C++11

I am writing a function to change the values of the pixels in an image. The way it works is by splitting up the task of shading each pixel into multiple threads. For example if there are 4 threads then each one will shade every 4 pixels. What I find strange is that the threaded approach is about a 1/10 of a second slower than doing it in a single loop. I can't figure out why this is since I have a quad core CPU and there is no real synchronization involved between the threads. I would expect it to be about 4x faster minus a bit of overhead. Am I doing something wrong here?


Note that I set nthreads=1 to measure the single loop approach.



void RGBImage::shade(Shader sh, size_t sx, size_t sy, size_t ex, size_t ey)
{
validate();
if(ex == 0)
ex = width;
if(ey == 0)
ey = height;

if(sx < 0 || sx >= width || sx >= ex || ex > width || sy < 0 || sy >= height || sy >= ey
|| ey > height)
throw std::invalid_argument("Bounds Invalid");

size_t w = ex - sx;
size_t h = ey - sy;
size_t nthreads = std::thread::hardware_concurrency();
if(nthreads > MAX_THREADS)
nthreads = MAX_THREADS;
else if(nthreads < 1)
nthreads = 1;

size_t load_per_thread = w * h / nthreads;
if(load_per_thread < MIN_THREAD_LOAD)
nthreads = (w * h) / MIN_THREAD_LOAD;

clock_t start = clock();
if(nthreads > 1)
{
std::unique_ptr<std::thread[]> threads(new std::thread[nthreads]);
for(size_t i = 0; i < nthreads; i++)
threads[i] = std::thread([=]()
{
for(size_t p = i; p < (w * h); p += nthreads)
{
size_t x = sx + p % w;
size_t y = sy + p / w;
sh(raster[y * width + x], x, y);
}
});
for(size_t i = 0; i < nthreads; i++)
threads[i].join();
}
else
{
for(size_t p = 0; p < (w * h); ++p)
{
size_t x = sx + p % w;
size_t y = sy + p / w;
sh(raster[y * width + x], x, y);
}
}
std::cout << ((float)(clock() - start) / CLOCKS_PER_SEC) << std::endl;
}

Aucun commentaire:

Enregistrer un commentaire