vendredi 30 octobre 2015

g++ faulty optimization with specialized template

I'm having problem with g++ (4.9.2) optimization that is produces faulty code that is puzzling to me. And by faulty, I mean the code output is fundementally different between optimized (-O1, -O2 or -O3) and non-optimized (-O0) compilation. And, of course, the optimized code is wrong.

I have class similar to <bitset>, where info is stored at bit-level and is instantiated with any number of bits, but has a specialized template for Bits with <= 8 bits.

#include <iostream>
using namespace std;

// generalized class Bits, uses array of specialized, 1-byte Bits
template <unsigned int bits=8, bool _=(bits>8)>
class Bits {
    Bits<8,false> reg[(bits+7)>>3];

public:
    void set(int pos)  { reg[pos>>3].set(pos%8);  };
    void clr(int pos)  { reg[pos>>3].clr(pos%8);  };
    bool get(int pos)  { reg[pos>>3].get(pos%8);  };
};

// specialized, 1-byte Bits (flag stored in a char)
template <unsigned int bits> class Bits<bits,false> {
    char reg;

public:
    Bits() : reg(0) {};
    Bits(int r) : reg(r) {};

    void set(int pos) { reg |=  mark(pos);        };
    void clr(int pos) { reg &= ~mark(pos);        };
    bool get(int pos) { return (reg & mark(pos)); };

    static int mark(int pos) { return ( 1 << pos ); };
};  


int main() {
    Bits<16> b;
    Bits<8> c;

    b.set(1);
    c.set(1);

    cout << b.get(1) << endl;
    cout << c.get(1) << endl;

    return 0;
};

The test is simple, set a bit and then print said bit state to stdout. This is done with a 16-bits Bits object (the generalized templated) and 8-bit Bits object (the specialized template). The expected answer is TRUE for either objects. And when I compile with no optimization (i.e. g++-4.9 -O0 main.cpp), this is exactly what I get. Output of ./a.out is:

1
1

But when I compile with -O1 optimization (i.e. g++-4.9 -O1 main.cpp), the results is different AND partially wrong:

0
1

Specifically, Bits<8> tests correctly in both optimzation (-O0 and -O3), but Bits<16> test correctly only with -O0 and not with -O1.

The optimizer (-O1, -O2, and -O3) all just optimizes out all then Bits member functions and simply jumps to the final results, calculated at compile-time. Obviously the optimizer is making some error, but I don't know what is the root cause. Does anyone know what I should be looking for to debug the problem?

Aucun commentaire:

Enregistrer un commentaire