c++11: Why does std::memory_order_relaxed permit side-effect reordering?

mercredi 13 juillet 2022

Why does std::memory_order_relaxed permit side-effect reordering?

While reading the cppref article on relaxed memory ordering introduced in C++11, I've struggled to understand the example provided to illustrate concurrent memory access (re)ordering.

The article provides the following:

Atomic operations tagged memory_order_relaxed are not synchronization operations; they do not impose an order among concurrent memory accesses. They only guarantee atomicity and modification order consistency.

For example, with x and y initially zero,
// Thread 1:
r1 = y.load(std::memory_order_relaxed); // A
x.store(r1, std::memory_order_relaxed); // B
// Thread 2:
r2 = x.load(std::memory_order_relaxed); // C 
y.store(42, std::memory_order_relaxed); // D
[...]

The thing that seems nonsensical to me is the following paragraph (with points of interest bolded):

[...]

is allowed to produce r1 == r2 == 42 because, although A is sequenced-before B within thread 1 and C is sequenced-before D within thread 2, nothing prevents D from appearing before A in the modification order of y, and B from appearing before C in the modification order of x. The side-effect of D on y could be visible to the load A in thread 1 while the side effect of B on x could be visible to the load C in thread 2. In particular, this may occur if D is completed before C in thread 2, either due to compiler reordering or at runtime.

As the article suggests, the only way that the result r1 == r2 == 42 could happen is if D is reordered before C, with A and B executing sequentially after D. This however, seems to contradict the definition of sequenced-before, which per the C++20 standard, seems to imply that every side-effect of a full-expression/expression-statement must be visible before the next statement^[1]^[2].

My question is, why is this reordering permitted to happen? An assignment to any memory address is considered a side-effect^[3], and the full-expression in line C r2 = x.load(std::memory_order_relaxed) does so by assigning the computation value x.load(std::memory_order_relaxed) to r2. While intuitively speaking, lines C and D carry no variable dependencies on each other, and should be able to be freely reordered (unlike lines A and B, where the stored value x depends on the previously assigned value to r1), the wording of the standard and sequenced-before seems to suggest otherwise, implying that the evaluation of C and any of its side-effects (such as storing the loaded value to r2) strongly happens-before any of D^[4]. What am I missing here?

[1]:

Full-expressions ― §6.9.1 P9:

Every value computation and side effect associated with a full-expression is sequenced before every value computation and side effect associated with the next full-expression to be evaluated.

[2]:

Expression-statement ― §8.3 P1:

[...]

All side effects from an expression statement are completed before the next statement is executed. An expression statement with the expression missing is called a null statement.

[Note 1: Most statements are expression statements — usually assignments or function calls. A null statement is useful to carry a label just before the } of a compound statement and to supply a null body to an iteration statement such as a while statement (8.6.2). —end note]

[3]:

Side-effect (relevant section bolded) ― §6.9.1 P7:

Reading an object designated by a volatile glvalue (7.2.1), modifying an object, calling a library I/O function, or calling a function that does any of those operations are all side effects, which are changes in the state of the execution environment. Evaluation of an expression (or a subexpression) in general includes both value computations (including determining the identity of an object for glvalue evaluation and fetching a value previously assigned to an object for prvalue evaluation) and initiation of side effects. When a call to a library I/O function returns or an access through a volatile glvalue is evaluated the side effect is considered complete, even though some external actions implied by the call (such as the I/O itself) or by the volatile access may not have completed yet.

[4]:

To quote cppref:

Note: informally, if A strongly happens-before B, then A appears to be evaluated before B in all contexts.

c++11