This question is based on Can't relaxed atomic fetch_add reorder with later loads on x86, like store can? I agree with answer given. On x86 00 will never occur because a.fetch_add has a lock prefix/full barrier and loads can't reorder above fetch_add but on other architectures like arm/mips it can print 00. I have a two followup question about store buffer on x86 and arm.
On 4 cpu x86_64 core i3
- Thread 1 runs calling foo and performing fetch_add, now a is 1 before b.load()
- Thread 2 runs calling bar and performing fetch_add, now b is 1 before a.load()
- Now the updated values of a and b from store buffer of cpu 1 and 2 becomes visible (after some indeterminate amount of time) and both loads of a and b return 1 1.
- I never get 11 on my pc (core i3 x86_64) i.e is 11 a valid output on x86 in iso c++ , so am i missing something ?
- Now x86_64 has an advantage fetch_add acting as a full barrier.
- For arm64 , output can be 00 sometimes due to cpu instruction reordering.
- For arm64 or some other arch, can the output be 00 if without reordering ?. My question is based on this. The store buffer values for function foo a.fetch_add(1) is not visible to bar's a.load() and b.fetch_add(1) is not visible to foo's b.load(). Hence we get 00 without reordering.
// g++ -O2 -pthread axbx.cpp ; while [ true ]; do ./a.out | grep "00" ; done
#include<cstdio>
#include<thread>
#include<atomic>
using namespace std;
atomic<int> a,b;
int reta,retb;
void foo(){
a.fetch_add(1,memory_order_relaxed); //add to a is stored in store buffer of cpu0
//a.store(1,memory_order_relaxed);
retb=b.load(memory_order_relaxed);
}
void bar(){
b.fetch_add(1,memory_order_relaxed); //add to b is stored in store buffer of cpu1
//b.store(1,memory_order_relaxed);
reta=a.load(memory_order_relaxed);
}
int main(){
thread t[2]{ thread(foo),thread(bar) };
t[0].join(); t[1].join();
printf("%d%d\n",reta,retb);
return 0;
}
Aucun commentaire:
Enregistrer un commentaire