mardi 2 février 2016

Least restrictive memory ordering for spin-lock with two atomics

I have some worker threads performing time-critical processing at regular intervals (approx 1 kHz). Each cycle, the workers are woken up to do a chore, each of which should (on average) be complete before the next cycle begins. They operate on the same object, which can be occasionally modified by the main thread.

To prevent races, but allow a modification of the object to occur before the next cycle, I have used a spin-lock along with an atomic counter to record how many threads are still doing work:

class Foo {
public:
    void Modify();
    void DoWork( SomeContext& );
private:
    std::atomic_flag locked = ATOMIC_FLAG_INIT;
    std::atomic<int> workers_busy = 0;
};

void Foo::Modify()
{
    while( locked.test_and_set( std::memory_order_acquire ) ) ;   // spin
    while( workers_busy.load() != 0 ) ;                           // spin

    // Modifications happen here ....

    locked.clear( std::memory_order_release );
}

void Foo::DoWork( SomeContext& )
{
    while( locked.test_and_set( std::memory_order_acquire ) ) ;   // spin
    ++workers_busy;
    locked.clear( std::memory_order_release );

    // Processing happens here ....

    --workers_busy;
}

This allows all remaining work to be completed immediately, provided at least one thread has begun, and will always block before another worker can begin work for the next cycle.

The atomic_flag is accessed with "acquire" and "release" memory orders, as appears to be an accepted way of implementing spin-locks with C++11. According to documentation at cppreference.com:

memory_order_acquire : A load operation with this memory order performs the acquire operation on the affected memory location: no memory accesses in the current thread can be reordered before this load. This ensures that all writes in other threads that release the same atomic variable are visible in the current thread.

memory_order_release : A store operation with this memory order performs the release operation: no memory accesses in the current thread can be reordered after this store. This ensures that all writes in the current thread are visible in other threads that acquire the same atomic variable and writes that carry a dependency into the atomic variable become visible in other threads that consume the same atomic.

As I understand the above, this is enough to synchronise protected accesses across threads to provide mutex behaviour, without being overly conservative about memory ordering.

What I wish to know is whether the memory ordering can be further relaxed because a side-effect of this pattern is that I'm using a spin-lock mutex to synchronise another atomic variable.

The calls to ++workers_busy, --workers_busy and workers_busy.load() all currently have the default memory order, memory_order_seq_cst. Given that the only interesting use for this atomic is to unblock Modify() with --workers_busy (which is not synchronised by the spin-lock mutex), could the same acquire-release memory order be used with this variable, using a "relaxed" increment? i.e.

void Foo::Modify()
{
    while( locked.test_and_set( std::memory_order_acquire ) ) ;
    while( workers_busy.load( std::memory_order_acquire ) != 0 ) ;  // <--
    // ....
    locked.clear( std::memory_order_release );
}

void Foo::DoWork( SomeContext& )
{
    while( locked.test_and_set( std::memory_order_acquire ) ) ;
    workers_busy.fetch_add( 1, std::memory_order_relaxed );         // <--
    locked.clear( std::memory_order_release );
    // ....
    workers_busy.fetch_sub( 1, std::memory_order_release );         // <--
}

Is this correct? Is it possible for any of these memory orderings to be relaxed further? And does it even matter?

Aucun commentaire:

Enregistrer un commentaire