dimanche 1 janvier 2017

lock-free synchronization, fences and memory order (store operation with acquire semantics)

I am migrating a project that was run on bare-bone to linux, and need to eliminate some {disable,enable}_scheduler calls. :)

So I need a lock-free sync solution in a single writer, multiple readers scenario, where the writer thread cannot be blocked. I came up with the following solution (pseudo-code), which does not fit to the usual acquire-release ordering:

class RWSync {
    std::atomic<int> version; // incremented after every modification
    std::atomic_bool invalid; // true during write
public:
  template<typename F> void sync(F lambda) {
    KWu32 currentVersion;
    do {
      do { // wait until the object is valid
        currentVersion = version.load(std::memory_order_acquire);
      } while (invalid.load(std::memory_order_acquire));
      lambda();
      std::atomic_thread_fence(std::memory_order_seq_cst);
      // check if something changed
    } while (version.load(std::memory_order_acquire) != currentVersion
        || invalid.load(std::memory_order_acquire));
  }
  void beginWrite() {
    invalid.store(true, std::memory_order_relaxed);
    std::atomic_thread_fence(std::memory_order_seq_cst);
  }
  void endWrite() {
    std::atomic_thread_fence(std::memory_order_seq_cst);
    version.fetch_add(1, std::memory_order_release);
    invalid.store(false, std::memory_order_release);
  }
}

I hope the intent is clear: I wrap the modification of a (non-atomic) payload between beginWrite/endWrite, and read the payload only inside the lambda function passed to sync().

As you can see, here I have an atomic store in beginWrite() where no writes after the store operation can be reordered before the store. I did not find suitable examples, and I am not experienced in this field at all, so I'd like some confirmation that it is OK (verification through testing is not easy either).

  1. Is this code race-free and work as I expect?

  2. If I use std::memory_order_seq_cst in every atomic operation, can I omit the fences? (Even if yes, I guess the performance would be worse)

  3. Can I drop the fence in endWrite()?

  4. Can I use memory_order_acq_rel in the fences? I don't really get the difference -- the single total order concept is not clear to me.

  5. Is there any simplification / optimization opportunity?

+1. I happily accept any better idea as the name of this class :)

Aucun commentaire:

Enregistrer un commentaire