jeudi 12 novembre 2020

Difference between Interlocked, InterlockedAcquire, and InterlockedRelease if single thread reordering is impossible

In all likelihood, a lockless implementation is already overkill for the purposes of my application, but I wanted to look into memory barriers and lockless-ness anyways in case I ever actually need to use these concepts in the future.

From what I can tell:

  1. an "InterlockedAcquire" function performs an atomic operation while preventing the compiler from moving code statements after the InterlockedAcquire to before the InterlockedAcquire.

  2. an "InterlockedRelease" function performs an atomic operation while preventing the compiler from moving code statements before the InterlockedRelease to after the InterlockedRelease.

  3. a vanilla "Interlocked" function performs an atomic operation while preventing the compiler from moving code statements in either direction across the Interlocked call.

My question is, if a function is structured such that the compiler can't reorder any of the code anyways because doing so would affect single-threaded behavior, is there a difference between any of the variants of an Interlocked function, or all they all effectively the same? Is the only difference between them how they interact with code reordering?

For a more concrete example, here's my current application - the produce() function as part of what will eventually be a multiple producer, single consumer queue built using a circular buffer:

template <typename T>
class Queue {
    private:
        long headIndex;
        long tailIndex;
        T* array[MAXQUEUESIZE];
    public:
        Queue() {
            headIndex = 0;
            tailIndex = 0;
            memset(array, 0, MAXQUEUESIZE*sizeof(void*);
        }
        ~Queue() {
        }

        bool produce(T value) {
            //1) prevents concurrent calls to produce() from causing corruption:
            long indexRetVal;
            long reservedIndex = tailIndex;
            do {
                indexRetVal = InterlockedCompareExchange64(&tailIndex, (reservedIndex + 1) % MAXQUEUESIZE, reserved);
            } while (indexRetVal != reservedIndex);

            //2) allocates the node.
            T* newValPtr = (T*) malloc(sizeof(T));
            if (newValPtr == null) {
                OutputDebugString("Queue: malloc returned null");
                return false;
            }
            *newValPtr = value;

            //3) prevents a concurrent call to consume from causing corruption by atomically replacing the old pointer:
            T* valPtrRetVal = InterlockedCompareExchangePointer(array + reservedIndex, newValPtr, null);
            //if the previous value wasn't null, then our circular buffer overflowed:
            if (valPtrRetVal != null) {
                OutputDebugString("Queue: circular buffer overflowed");
                return false;
            }

            //otherwise, everything worked fine
            return true;
        }
};

As I understand it, 3) will occur after 1) and 2) regardless of what I do anyways, but I should change 1) to an InterlockedRelease because I don't care whether it occurs before or after 2) and I should let the compiler decide.

Aucun commentaire:

Enregistrer un commentaire