samedi 28 février 2015

Memory ordering behavior of std::atomic::load

While I have to admit I don't fully understand the formal definitions of the different relaxed semantics of memory order I thought that the sequentially consistent ordering was pretty straightforward in that it guarantees that "a single total order exists in which all threads observe all modifications in the same order." To me this implies that the std::atomic::load with the default memory order of std::memory_order_seq_cst will also act as a memory fence. This is further corroborated by the following statement under "Sequentially-consistent ordering":


Total sequential ordering requires a full memory fence CPU instruction on all multi-core systems.


Yet, my simple example below demonstrates this is not the case with MSVC 2013, gcc 4.9 (x86) and clang 3.5.1 (x86), where the atomic load simply translates to a load instruction.



#include <atomic>

std::atomic_long al;

#ifdef _WIN32
__declspec(noinline)
#else
__attribute__((noinline))
#endif
long load() {
return al.load(std::memory_order_seq_cst);
}

int main(int argc, char* argv[]) {
long r = load();
}


With gcc this looks like:



load():
mov rax, QWORD PTR al[rip] ; <--- plain load here, no fence or xchg
ret
main:
call load()
xor eax, eax
ret


I'll omit the msvc and clang which are essentially identical. Now on gcc for ARM we get what I expected:



load():
dmb sy ; <---- data memory barrier here
movw r3, #:lower16:.LANCHOR0
movt r3, #:upper16:.LANCHOR0
ldr r0, [r3]
dmb sy ; <----- and here
bx lr
main:
push {r3, lr}
bl load()
movs r0, #0
pop {r3, pc}


Am I wrong to assume that the atomic::load should also act as a memory barrier ensuring that all previous non-atomic writes will become visible by other threads?


This is not an academic question, it results in a subtle race condition in our code which put into question my understanding of the behavior of std::atomic.


Aucun commentaire:

Enregistrer un commentaire