mardi 3 mars 2015

C++ 11 std thread sumation with atomic very slow

I wanted to learn to use C++ 11 std::threads with VS2012 and I wrote a very simple C++ console program with two threads which just increment a counter. I also want to test the performance difference when two threads are used. Test program is given below:



#include <iostream>
#include <thread>
#include <conio.h>
#include <atomic>

std::atomic<long long> sum(0);
//long long sum;

using namespace std;

const int RANGE = 100000000;

void test_without_threds()
{
sum = 0;
for(unsigned int j = 0; j < 2; j++)
for(unsigned int k = 0; k < RANGE; k++)
sum ++ ;
}

void call_from_thread(int tid)
{
for(unsigned int k = 0; k < RANGE; k++)
sum ++ ;
}

void test_with_2_threds()
{
std::thread t[2];
sum = 0;
//Launch a group of threads
for (int i = 0; i < 2; ++i) {
t[i] = std::thread(call_from_thread, i);
}

//Join the threads with the main thread
for (int i = 0; i < 2; ++i) {
t[i].join();
}
}

int _tmain(int argc, _TCHAR* argv[])
{
chrono::time_point<chrono::system_clock> start, end;

cout << "-----------------------------------------\n";
cout << "test without threds()\n";

start = chrono::system_clock::now();
test_without_threds();
end = chrono::system_clock::now();

chrono::duration<double> elapsed_seconds = end-start;

cout << "finished calculation for "
<< chrono::duration_cast<std::chrono::milliseconds>(end - start).count()
<< "ms.\n";

cout << "sum:\t" << sum << "\n";\

cout << "-----------------------------------------\n";
cout << "test with 2_threds\n";

start = chrono::system_clock::now();
test_with_2_threds();
end = chrono::system_clock::now();

cout << "finished calculation for "
<< chrono::duration_cast<std::chrono::milliseconds>(end - start).count()
<< "ms.\n";

cout << "sum:\t" << sum << "\n";\

_getch();
return 0;
}


Now, when I use for the counter just the long long variable (which is commented) I get value which is different from the correct - 100000000 instead of 200000000. I am not sure why is that and I suppose that the two threads are changing the counter at the same time, but I am not sure how it happens really because ++ is just a very simple instruction. It seems that the threads are caching the sum variable at beginning. Performance is 110 ms with two threads vs 200 ms for one thread.


So the correct way according to documentation is to use std::atomic. However now the performance is much worse for both cases as about 3300 ms without threads and 15820 ms with threads. What is the correct way to use std::atomic in this case?


Aucun commentaire:

Enregistrer un commentaire