mardi 14 juin 2022

reinterpret_cast is way more expensive on windows? [closed]

So I am converting this vector of i32 and float16 to vector of u8 using the following code

template <typename T>
void convertToU8(const T& inValue, std::vector<uint8_t>& outValues) {
    const auto inputU8 = reinterpret_cast<const uint8_t*>(&inValue);
    for (size_t i = 0; i < sizeof(inValue); ++i)
        outValues.push_back(*(inputU8 + i));
}

vector<uint8_t> u8ValuesBuf;
u8ValuesBuf.reserve(estimatedSize);
for (auto i : i32Buffer) {
    convertToU8<int32_t>(i, u8ValuesBuf);
}
for (auto i : float16Buffer) {
    convertToU8<float16>(i, u8ValuesBuf);
}

From what I see during the runtime, the time it took for the execution of the project on Windows is 30000+ ms. Now the same code on Linux, is much faster and completes all the things under 3100ms. There is no conditional compilation which could add extra time for windows.

To isolate the issue, here is what I did I changed the for loops like below and created a dummy vector of u8

vector<uint8_t> u8ValuesBuf;
u8ValuesBuf.reserve(estimatedSize);
vector<uint8_t> temp32(i32Buffer.size() * 4, 0);
copy(temp32.begin(), temp32.end(), back_inserter(u8ValuesBuf));
//for (auto i : i32Buffer) {
//    convertToU8<int32_t>(i, u8ValuesBuf);
//}

vector<uint8_t> temp16(float16Buffer.size() * 2, 0);
copy(temp16.begin(), temp16.end(), back_inserter(u8ValuesBuf));
//for (auto i : float16Buffer) {
//    convertToU8<float16>(i, u8ValuesBuf);
//}

And the time was cut down to 3100ms on windows too! What could possibly be happening with reinterpret_cast?

Any help is much appreciated!

PS: Using Visual Studio 2019 on Windows

Aucun commentaire:

Enregistrer un commentaire