I have a multi-threaded app that uses the GPU, which is inherently single-threaded, and the actual APIs I use, cv::gpu::FAST_GPU
, does crash when I try to use them multi-threaded, so basically I have:
static std::mutex s_FAST_GPU_mutex;
{
std::lock_guard<std::mutex> guard(s_FAST_GPU_mutex);
cv::gpu::FAST_GPU(/*params*/)(/*parameters*/);
}
Now, benchmarking the code shows me FAST_GPU()
in isolation is faster than the CPU FAST()
, but in the actual application my other threads spend a lot of time waiting for the lock, so the overall throughput is worse.
Looking through the documentation, and at this answer it seems that this might be possible:
static std::mutex s_FAST_GPU_mutex;
static std::unique_lock<std::mutex> s_FAST_GPU_lock(s_FAST_GPU_mutex, std::defer_lock);
{
// Create an unlocked guard
std::lock_guard<decltype(s_FAST_GPU_lock)> guard(s_FAST_GPU_lock, std::defer_lock);
if (guard.try_lock())
{
cv::gpu::FAST_GPU(/*params*/)(/*parameters*/);
}
else
{
cv::FAST(/*parameters*/);
}
}
Does this make sense? Am I creating a race condition that might come and bite me later? Using std::defer_lock
twice feels odd...
Aucun commentaire:
Enregistrer un commentaire