For the measurement below I've been using x86_64 GNU/Linux with kernel 4.4.0-109-generic #132-Ubuntu SMP running on the AMD FX(tm)-8150 Eight-Core Processor (which has a 64 byte cache-line size).
The full source code can be obtained here: https://github.com/CarloWood/ai-threadsafe-testsuite/blob/master/src/condition_variable_test.cxx
which is independent of other libraries. Just compile with:
g++ -pthread -std=c++11 -O3 condition_variable_test.cxx
What I really tried to do here is measure how long it takes to execute a call to notify_one() when one or more threads are actually waiting, relative to how long that takes when no thread is waiting on the condition_variable used.
To my astonishment I found that both cases are in the microsecond range: when 1 thread is waiting it takes about 14 to 20 microseconds; when no thread is waiting it takes apparently less, but still at least 2 microseconds.
In other words, if you have a producer/consumer scenario and every time there is nothing to do for the consumer you let them call wait(), and every time something new is written to the queue by a producer you call notify_one() assuming that the implementation of std::condition_variable will be smart enough not to spend a lot of time when no threads are waiting in the first place.. then oh horror, your application will become a lot slower than with the code that I wrote to TEST how long a call to notify_one() takes when a thread is waiting!
It seems that the code that I used is a must to speed up such scenarios. And that confuses me: why on earth isn't the code that I wrote already part of std::condition_variable ?
The code in question is, instead of doing:
// Producer thread:
add_something_to_queue();
cv.notify_one();
// Consumer thread:
if (queue.empty())
{
std::unique_lock<std::mutex> lk(m);
cv.wait(lk);
}
You can gain a speed up of a factor of 1000 by doing:
// Producer thread:
add_something_to_queue();
int waiting;
while ((waiting = s_idle.load(std::memory_order_relaxed)) > 0)
{
if (!s_idle.compare_exchange_weak(waiting, waiting - 1, std::memory_order_relaxed, std::memory_order_relaxed))
continue;
std::unique_lock<std::mutex> lk(m);
cv.notify_one();
break;
}
// Consumer thread:
if (queue.empty())
{
std::unique_lock<std::mutex> lk(m);
s_idle.fetch_add(1, std::memory_order_relaxed);
cv.wait(lk);
}
Am I making some horrible mistake here? Or are my findings correct?
Aucun commentaire:
Enregistrer un commentaire