jeudi 5 septembre 2019

How to express in C++11 ordinary store (export) and load (import) barriers (fences)?

The following code implements some lock-free inter-thread communication that requires the usage of memory fences but the release-acquire semantics is not appropriate (actually it's kind of inversion of the release-acquire semantic)

volatile bool valid=true;
volatile uint8_t blob[1024];

void zero_blob() {
    valid=false;
    STORE_BARRIER;
    memset(blob,0,1024);
}

uint8_t try_get(size_t index) {
    uint8_t res = blob[index];
    LOAD_BARRIER;
    return valid ? res : 0; 
}

I'm able to make this code correct on all hardware architecture by simply using native memory barriers e.g. on Intel there is no need for memory barriers here, on Sparc (RMO) membar #StoreStore and membar #LoadLoad, on PawerPC lwsync for both. So no big deal and the code is a typical example of using store and load barriers. Now, what C++11 construction should I use to make the code correct assuming that I don't want to convert 'blob' to std::atomic objects as it would make 'blob' a guard object and variable 'valid' a guarded one whereas it's the other way around. Converting variable 'valid' to a std::atomic object is OK for me, but there are no barriers to guarantee the correctness. To make it clear, let's consider the following code:

volatile std::atomic<bool> valid{true};
volatile uint8_t blob[1024];

void zero_blob() {
    valid.store(false, std::memory_order_release);
    memset(blob,0,1024);
}

uint8_t try_get(size_t index) {
    uint8_t res = blob[index];
    return valid.load(std::memory_order_acquire) ? res : 0; 
}

The code is incorrect as the barriers are placed in the wrong places and hence writing to 'blob' can precede writing to 'valid' or/and loading from 'valid' can precede loading from 'blob'. I thought that in order to deal with such constructions C++11 provided std::atomic_thread_fence and the code should be:

volatile std::atomic<bool> valid{true};
volatile uint8_t blob[1024];

void zero_blob() {
    valid.store(false, std::memory_order_relaxed);
    std::atomic_thread_fence(std::memory_order_release);
    memset(blob,0,1024);
}

uint8_t try_get(size_t index) {
    uint8_t res = blob[index];
    std::atomic_thread_fence(std::memory_order_acquire);
    return valid.load(std::memory_order_relaxed); ? res : 0; 
}

Unfortunately C++11 says:

A release fence A synchronizes with an acquire fence B if there exist atomic
operations X and Y, both operating on some atomic object M, such that A is
sequenced before X, X modifies M, Y is sequenced before B, and Y reads the
value written by X or a value written by any side effect in the hypothetical
release sequence X would head if it were a release operation.

which clearly states that std::atomic_thread_fence should be placed in the other sides of the operations on the atomic object.

Aucun commentaire:

Enregistrer un commentaire