I need to traverse a vector
, read each element, and map to the modulo division value. Modulo division is fast for divisors of power2. So, I need to choose between a mod
and mod_power2
during the runtime. Following is a rough outline. Please assume that I am using templates to visit the vector.
Bit manipulation tricks were taken from https://graphics.stanford.edu/~seander/bithacks.html
static inline constexpr bool if_power2(int v) {
return v && !(v & (v - 1));
}
static inline constexpr int mod_power2(int val, int num_partitions) {
return val & (num_partitions - 1);
}
static inline constexpr int mod(int val, int num_partitions) {
return val % num_partitions;
}
template<typename Func>
void visit(const std::vector<int> &data, Func &&func) {
for (size_t i = 0; i < data.size(); i++) {
func(i, data[i]);
}
}
void run1(const std::vector<int> &v1, int num_partitions, std::vector<int> &v2) {
if (if_power2(num_partitions)) {
visit(v1,
[&](size_t i, int v) {
v2[i] = mod_power2(v, num_partitions);
});
} else {
visit(v1,
[&](size_t i, int v) {
v2[i] = mod(v, num_partitions);
});
}
}
void run2(const std::vector<int> &v1, int num_partitions, std::vector<int> &v2) {
const auto part = if_power2(num_partitions) ? mod_power2 : mod;
visit(v1, [&](size_t i, int v) {
v2[i] = part(v, num_partitions);
});
}
My question is, run1
vs run2
. I prefer run2
because it is easy to read and no code duplication. But when when I check both in godbolt (https://godbolt.org/z/3ov59rb5s), AFAIU, run1
is inlined better than run2
.
So, is there a better way to write a run
function without compromising on the perf?
Aucun commentaire:
Enregistrer un commentaire