lundi 27 septembre 2021

choose a constexpr based on a runtime value and use it inside a hot loop

I need to traverse a vector, read each element, and map to the modulo division value. Modulo division is fast for divisors of power2. So, I need to choose between a mod and mod_power2 during the runtime. Following is a rough outline. Please assume that I am using templates to visit the vector.

Bit manipulation tricks were taken from https://graphics.stanford.edu/~seander/bithacks.html

static inline constexpr bool if_power2(int v) {
  return v && !(v & (v - 1));
}

static inline constexpr int mod_power2(int val, int num_partitions) {
  return val & (num_partitions - 1);
}

static inline constexpr int mod(int val, int num_partitions) {
  return val % num_partitions;
}

template<typename Func>
void visit(const std::vector<int> &data, Func &&func) {
  for (size_t i = 0; i < data.size(); i++) {
    func(i, data[i]);
  }
}

void run1(const std::vector<int> &v1, int num_partitions, std::vector<int> &v2) {
  if (if_power2(num_partitions)) {
    visit(v1,
          [&](size_t i, int v) {
            v2[i] = mod_power2(v, num_partitions);
          });
  } else {
    visit(v1,
          [&](size_t i, int v) {
            v2[i] = mod(v, num_partitions);
          });
  }
}

void run2(const std::vector<int> &v1, int num_partitions, std::vector<int> &v2) {
  const auto part = if_power2(num_partitions) ? mod_power2 : mod;

  visit(v1, [&](size_t i, int v) {
    v2[i] = part(v, num_partitions);
  });
}

My question is, run1 vs run2. I prefer run2 because it is easy to read and no code duplication. But when when I check both in godbolt (https://godbolt.org/z/3ov59rb5s), AFAIU, run1 is inlined better than run2.

So, is there a better way to write a run function without compromising on the perf?

Aucun commentaire:

Enregistrer un commentaire