lundi 22 juin 2020

How to measure duration in very fine resolution?

I want to measure the duration of an operation the following way:

t1 = GetCurrentTime()
// do the operation
t2 = GetCurrentTime()

return TransformToSeconds(t2 - t1)

How to do that in C++11? I want the measurement be as fast as possible, i.e. GetCurrentTime() should be fast, and the resolution of the clock should be as fine as possible.

I did some study, and some measurements, and I am very confused.

On my Windows machine (cl.exe 19.16.27035) I was able to measure consistent results with this program:

#include <chrono>
#include <cstdint>
#include <stdio.h>
#include <Windows.h>
static_assert(std::is_same_v<decltype(LARGE_INTEGER::QuadPart), std::int64_t>);
constexpr unsigned Repeat = 3000000;

const std::int64_t WindowsTicksPerSec = [] {
    LARGE_INTEGER ticksPerSec;
    QueryPerformanceFrequency(&ticksPerSec);
    return ticksPerSec.QuadPart;
}();

std::int64_t GetWindowsNow()
{
    LARGE_INTEGER ticks;
    QueryPerformanceCounter(&ticks);

    return ticks.QuadPart;   // number of "ticks"
}

double TestWindowsClock()
{
    double durationSeconds = 0.0;

    for (unsigned i = 0; i < Repeat; i++) {
        const std::int64_t t1 = GetWindowsNow();
        const std::int64_t t2 = GetWindowsNow();

        durationSeconds += double(t2 - t1) / WindowsTicksPerSec;
    }

    return durationSeconds / Repeat;
}

template <class Clock>
double TestSTLClock()
{
    double durationSeconds = 0.0;

    for (unsigned i = 0; i < Repeat; i++) {
        const typename Clock::time_point t1 = Clock::now();
        const typename Clock::time_point t2 = Clock::now();

        durationSeconds += std::chrono::duration<double>(t2 - t1).count();
    }

    return durationSeconds / Repeat;
}

void PrintMeasurements(const char* label, double durationSeconds)
{
    printf("%-21s: %7.3f ns ", label, durationSeconds * 1000000000);

    for (unsigned i = 0; i < durationSeconds * 1000000000; i++)
        printf("=");

    printf("\n");
}

int main()
{
    PrintMeasurements("Windows clock",         TestWindowsClock());
    PrintMeasurements("system_clock",          TestSTLClock<std::chrono::system_clock>());
    PrintMeasurements("steady_clock",          TestSTLClock<std::chrono::steady_clock>());
    PrintMeasurements("high_resolution_clock", TestSTLClock<std::chrono::high_resolution_clock>());

    static_assert(std::is_same_v<std::chrono::steady_clock, std::chrono::high_resolution_clock>);
}

It prints out the following results (it's more or less the same in each execution):

Windows clock        :  19.795 ns ====================
system_clock         :  30.168 ns ===============================
steady_clock         :  51.390 ns ====================================================
high_resolution_clock:  52.166 ns =====================================================

Which contradicts to the common sense answer (use high_resolution_clock) and to the cppreference.com recommendation (use steady_clock). As we can see:

  • One is able to give a custom implementation that is faster than any standard solution.
  • high_resolution_clock is the worst of all. It's the same as steady_clock in MSVC on a sidenote.

If I want to measure the duration of an operation in a portable way, the picture is even more complicated, because in different compilers different methods will be the best. To compare results on a linux machine, use this program on Godbolt. Note that on Godbolt it is very unreliable: each execution gives significantly different results. This one on Wandbox is more stable. Curiously enough turning on optimizations gives worse results.

Aucun commentaire:

Enregistrer un commentaire