mardi 16 août 2022

How to express float constants precisely in source code

I have some C++11 code generated via a code generator that contains a large array of floats, and I want to make sure that the compiled values are precisely the same as the compiled values in the generator (assuming that both depend on the same float ISO norm)

So I figured the best way to do it is to store the values as hex representations and interpret them as float in the code.

Edit for Clarification: The code generator takes the float values and converts them to their corresponding hex representations. The target code is supposed to convert back to float.

It looks something like this:

const unsigned int data[3] = { 0x3d13f407U, 0x3ea27884U, 0xbe072dddU};
float const* ptr = reinterpret_cast<float const*>(&data[0]);

This works and gives me access to all the data element as floats, but I recently stumbled upon the fact that this is actually undefined behavior and only works because my compiler resolves it the way I intended:

https://gist.github.com/shafik/848ae25ee209f698763cffee272a58f8

https://en.cppreference.com/w/cpp/language/reinterpret_cast.

The standard basically says that reinterpret_cast is not defined between POD pointers of different type.

So basically I have three options:

  1. Use memcopy and hope that the compiler will be able to optimize this

  2. Store the data not as hex-values but in a different way.

  3. Use std::bit_cast from C++20.

I cannot use 3) because I'm stuck with C++11.

I don't have the resources to store the data array twice, so I would have to rely on the compiler to optimize this. Due to this, I don't particularly like 1) because it could stop working if I changed compilers or compiler settings.

So that leaves me with 2):

Is there a standardized way to express float values in source code so that they map to the exact float value when compiled? Does the ISO float standard define this in a way that guarantees that any compiler will follow the interpretation? I imagine if I deviate from the way the compiler expects, I could run the risk that the float "neighbor" of the number I actually want is used.

I would also take alternative ideas if there is an option 4 I forgot.

Aucun commentaire:

Enregistrer un commentaire