jeudi 22 février 2018

Is the performance of notify_one() really this bad?

For the measurement below I've been using x86_64 GNU/Linux with kernel 4.4.0-109-generic #132-Ubuntu SMP running on the AMD FX(tm)-8150 Eight-Core Processor (which has a 64 byte cache-line size).

The full source code can be obtained here: https://github.com/CarloWood/ai-threadsafe-testsuite/blob/master/src/condition_variable_test.cxx

which is independent of other libraries. Just compile with:

g++ -pthread -std=c++11 -O3 condition_variable_test.cxx

What I really tried to do here is measure how long it takes to execute a call to notify_one() when one or more threads are actually waiting, relative to how long that takes when no thread is waiting on the condition_variable used.

To my astonishment I found that both cases are in the microsecond range: when 1 thread is waiting it takes about 14 to 20 microseconds; when no thread is waiting it takes apparently less, but still at least 2 microseconds.

In other words, if you have a producer/consumer scenario and every time there is nothing to do for the consumer you let them call wait(), and every time something new is written to the queue by a producer you call notify_one() assuming that the implementation of std::condition_variable will be smart enough not to spend a lot of time when no threads are waiting in the first place.. then oh horror, your application will become a lot slower than with the code that I wrote to TEST how long a call to notify_one() takes when a thread is waiting!

It seems that the code that I used is a must to speed up such scenarios. And that confuses me: why on earth isn't the code that I wrote already part of std::condition_variable ?

The code in question is, instead of doing:

// Producer thread:
add_something_to_queue();
cv.notify_one();

// Consumer thread:
if (queue.empty())
{
  std::unique_lock<std::mutex> lk(m);
  cv.wait(lk);
}

You can gain a speed up of a factor of 1000 by doing:

// Producer thread:
add_something_to_queue();
int waiting;
while ((waiting = s_idle.load(std::memory_order_relaxed)) > 0)
{
  if (!s_idle.compare_exchange_weak(waiting, waiting - 1, std::memory_order_relaxed, std::memory_order_relaxed))
    continue;
  std::unique_lock<std::mutex> lk(m);
  cv.notify_one();
  break;
}

// Consumer thread:
if (queue.empty())
{
  std::unique_lock<std::mutex> lk(m);
  s_idle.fetch_add(1, std::memory_order_relaxed);
  cv.wait(lk);
}

Am I making some horrible mistake here? Or are my findings correct?

Aucun commentaire:

Enregistrer un commentaire