mardi 28 août 2018

Recommended way to populate an array stored in unique_ptr

I am trying to determine the best practices (in terms of performance and style) approach to populate an array stored in unique_ptr. For your reference, I am testing on Windows 10 Pro, version 1607, Intel(R) Xeon(R) CPU E5-1650 v2 @ 3.50Ghz, 16 GB RAM and I am using Microsoft Visual Studio Enterprise 2017 15.8.2. The code I am testing is:

#include <iostream>
#include <chrono>

using namespace std;
using namespace std::chrono;

constexpr unsigned __int64 NUMBER_OF_BYTES = 1024 * 1024 * 10;

int main()
{
    const auto buffer = make_unique<char[]>(NUMBER_OF_BYTES);
    const auto start_time = high_resolution_clock::now();

    for (unsigned __int64 i = 0; i < NUMBER_OF_BYTES; i++)
    {
        buffer[i] = static_cast<char>(i);
    }

    const auto end_time = high_resolution_clock::now();
    const duration<float> time_delta = end_time - start_time;
    const auto duration = duration_cast<milliseconds>(time_delta);
    cout << "Duration: " << duration.count() << endl;

    return 0;
}

and this takes 1161 ms to complete. When I change the inner loop to:

char* raw_pointer = buffer.get();
for (unsigned __int64 i = 0; i < NUMBER_OF_BYTES; i++)
{
    raw_pointer[i] = static_cast<char>(i);
}

the execution time is reduced to 36ms so roughly 32 times faster. The results are the same for x86 and x64 targets. I can see in generated assembly that with the indexer on unique_ptr, every iteration of the loop goes through a wrapper to get to the pointer and that is what I am assuming is causing the performance difference. Those numbers are generated without any compiler optimization. With optimization, the wrapper is optimized away and both approaches take equal amount of time - 22 ms. So I do understand where the performance difference comes from (the wrapper).

Now, for the reasons that are not important, I do not want to enable optimization in my compiler. In that case, what is the recommended way to populate my array in a performant way, is it using the raw pointer from get() just like I did or is there a better, more canonical and in the spirit of modern C++ way? Thank you.

Aucun commentaire:

Enregistrer un commentaire