jeudi 3 novembre 2016

GCC -march-native slows down threaded code

I made an odd observation. When I active the CXX flag -march=native the timing of my code snippets becomes twice as long.

// -> sequential
int n = (int) 1e7;
Vector<double, 32> a;
a.init(n);
for (int i = 0; i < n; i++)
    a(i) = 1.0;

double r1;
Timer::start();
psum(a.data, n, r1);
Timer::stop();

std::cout << "timing (ms): " << Timer::get_timing() << std::endl;
std::cout << r1 << std::endl;
// <-

// -> threading simple
int n_threads = 2;
Vector<double, 32> b;
b.init(n);
for (int i = 0; i < n; i++)
    b(i) = 2.0;

double r2;
Timer::start();
std::thread t1(psum, b.data + n/2, n/2, std::ref(r1));
psum(b.data, n/2, r2);
t1.join();
Timer::stop();

std::cout << "timing (ms): " << Timer::get_timing() << std::endl;
std::cout << r1 + r2 << std::endl;
// <-

Specifically, the threaded example jumps from 8 ms to 16 ms. And 16 ms is the timing of the sequential code.

Extra info:

  • g++ 6.2 compiler
  • ubuntu 16.10
  • intel i5-6200u (skylake)
  • vectors are 32-byte aligned
  • compile with c++ -std=c++11 -O3 -pthread ...

Any idea where this comes from?

Aucun commentaire:

Enregistrer un commentaire