So I am converting this vector of i32 and float16 to vector of u8 using the following code
template <typename T>
void convertToU8(const T& inValue, std::vector<uint8_t>& outValues) {
    const auto inputU8 = reinterpret_cast<const uint8_t*>(&inValue);
    for (size_t i = 0; i < sizeof(inValue); ++i)
        outValues.push_back(*(inputU8 + i));
}
vector<uint8_t> u8ValuesBuf;
u8ValuesBuf.reserve(estimatedSize);
for (auto i : i32Buffer) {
    convertToU8<int32_t>(i, u8ValuesBuf);
}
for (auto i : float16Buffer) {
    convertToU8<float16>(i, u8ValuesBuf);
}
From what I see during the runtime, the time it took for the execution of the project on Windows is 30000+ ms. Now the same code on Linux, is much faster and completes all the things under 3100ms. There is no conditional compilation which could add extra time for windows.
To isolate the issue, here is what I did I changed the for loops like below and created a dummy vector of u8
vector<uint8_t> u8ValuesBuf;
u8ValuesBuf.reserve(estimatedSize);
vector<uint8_t> temp32(i32Buffer.size() * 4, 0);
copy(temp32.begin(), temp32.end(), back_inserter(u8ValuesBuf));
//for (auto i : i32Buffer) {
//    convertToU8<int32_t>(i, u8ValuesBuf);
//}
vector<uint8_t> temp16(float16Buffer.size() * 2, 0);
copy(temp16.begin(), temp16.end(), back_inserter(u8ValuesBuf));
//for (auto i : float16Buffer) {
//    convertToU8<float16>(i, u8ValuesBuf);
//}
And the time was cut down to 3100ms on windows too! What could possibly be happening with reinterpret_cast?
Any help is much appreciated!
PS: Using Visual Studio 2019 on Windows
Aucun commentaire:
Enregistrer un commentaire