mercredi 17 août 2016

Comparing optimized builds with switch case and polymorphism

I have the need to performance test two solutions - one that uses polymorphism to execute switch on type and one that uses a switch case to select which of some functions to execute. I really need to optimize this code. I wrote the following test case (You can simply copy paste the code, compile it with g++ -std=c++14 -O3 and run it with echo 1 | ./a.out!) The code is really simple if you read it!

#include <iostream>
#include <chrono>
#include <functional>
#include <array>
#include <cassert>
#include <vector>
#include <memory>
using namespace std;

struct profiler
{
    std::string name;
    std::chrono::high_resolution_clock::time_point p;
    profiler(std::string const &n) :
        name(n), p(std::chrono::high_resolution_clock::now()) { }
    ~profiler()
    {
        using dura = std::chrono::duration<double>;
        auto d = std::chrono::high_resolution_clock::now() - p;
        std::cout << name << ": "
            << std::chrono::duration_cast<dura>(d).count()
            << std::endl;
    }
};
#define PROFILE_BLOCK(pbn) profiler _pfinstance(pbn)

class Base {
public:
    virtual int increment(int in) {
        return in + 2;
    }
};
class Derived : public Base {
public:
    int increment(int in) override {
        return ++in;
    }
};

int increment_one(int in) {
    return in + 2;
}
int increment_two(int in) {
    return ++in;
}
int increment_three(int in) {
    return in + 4;
}
int increment_four(int in) {
    return in + 2;
}

static constexpr unsigned long long NUMBER_LOOP{5000000000};
int main() {

    int which_function;
    cin >> which_function;

    {
        PROFILE_BLOCK("nothing");
    }

    {
        PROFILE_BLOCK("switch case");
        auto counter = 0;
        for (unsigned long long i  = 0; i < NUMBER_LOOP; ++i) {
            switch(which_function) {
            case 0:
                counter = increment_one(counter);
                break;
            case 1:
                counter = increment_two(counter);
                break;
            case 2:
                counter = increment_three(counter);
                break;
            case 3:
                counter = increment_four(counter);
                break;
            default:
                assert(false);
                break;
            }
        }
        cout << counter << endl;
    }

    {
        PROFILE_BLOCK("polymorphism");
        auto counter = 0;
        std::unique_ptr<Base> ptr_base{new Derived()};
        for (unsigned long long i = 0; i < NUMBER_LOOP; ++i) {
            counter = ptr_base->increment(counter);
        }
    }

    return 0;
}

The output I get when I build with g++ -std=c++14 -O3 and run with echo 1 | ./a.out is

nothing: 1.167e-06
705032704
switch case: 4.089e-06
polymorphism: 9.299

I am failing to understand what exactly is causing the switch-case to be almost as fast as the nothing case. Is this because of inlining? Is it because the compiler precomputes the values for each input scenario and puts them in a lookup table? What causes the switch-case to be so fast?

And how can I go about writing a more fair performance test for this scenario? In general I never understand whether the code is fast because of the straight unoptimized translation between the C++ code and assembly or whether its the compiler precomputing a value and completely skipping compilation and producing "no-op style" code.

Note the profiler struct has been copied straight off another SO answer and is not that relevant to the question other than the fact that it measures time

Note As pointed out in the comments below by @dau_sama running the same test on a linux box with gcc instead of clang results in the switch case taking much longer (3.34 in this case) but still much lesser than the polymorphism case

Aucun commentaire:

Enregistrer un commentaire