dimanche 5 janvier 2020

OpenCL Kernel timing measures 0 seconds or causes SIGABRT

I have problems measuring the time of a simple OpenCL kernel on Arch Linux (Manjaro). OpenCL's means to measure time give me an time of 0 seconds or give an error, depending on the conditions.

My problem

When executing a passthough kernel (basically out[i] = in[i];) on a big NDRange, the CPU measures the correct execution time, while OpenCL reports 0. Here is the console output:

CPU: 1.27449 s
GPU: 0 s

passthrough.cpp (excerpts)

#include <CL/cl2.hpp>
#include "../Timer.hpp"

// start timer
Timer t;

// enqueue blocking
cl::Event processingEvent;
cl_int err = device->_queue.enqueueNDRangeKernel(
        passthrough, // kernel
        cl::NullRange, // offset
        cl::NDRange(bufferSize), // global range
        {}, // local range
        nullptr, // wait for events
        &processingEvent // processing event
);
processingEvent.wait();

// measure time
double cpuUsec = t.deltaUsec();
double gpuUsec = Timer::deltaUsec(processingEvent);

// debug
cout << "CPU: " << cpuUsec / 1000000.0 << " s\n";
cout << "GPU: " << gpuUsec / 1000000.0 << " s\n";

Timer.cpp (excerpts)

#include <chrono>
#include <ratio>
#include <CL/cl2.hpp>

Timer::Timer() {
    from = high_resolution_clock::now();
}

double Timer::deltaUsec() {
    return duration_cast<duration<double, std::ratio<1, 1000000>>>(high_resolution_clock::now() - from).count();
}

double Timer::deltaUsec(const cl::Event &event) {
    return (double) Timer::deltaNsec(event) / 1000.0;
}

// main function for the timing
unsigned long Timer::deltaNsec(const cl::Event &event) {
    auto end = event.getProfilingInfo<CL_PROFILING_COMMAND_END>();
    auto start = event.getProfilingInfo<CL_PROFILING_COMMAND_START>();
    return end - start;
}

What I have tried

  1. I double checked the input values to be equal to the output - so the kernel actually works.
  2. I deactivated compiler optimizations - just in case they cause the GPU to use the same buffer for in- and output values.
  3. I added some logic so the code acutally needs more time to execute.
  4. I checked in LLDB for the start and end variables in Timer::deltaNsec() not to be 0. They aren't and increase everytime I check.
  5. After adding #define CL_HPP_ENABLE_EXCEPTIONS before #import cl2.hpp I got the following SIGABRT error:
terminate called after throwing an instance of 'cl::Error'
  what():  clGetEventProfileInfo

LLDB hinted me to the line auto end = event.getProfilingInfo<CL_PROFILING_COMMAND_END>(); in Timer.cpp which I think is really confusing.

According to this answer I used the timer correctly - did I?


Some more info on my system / environment

Used OpenCL driver (excerpts from custom YAML):

    NVIDIA CUDA:
      Vendor: NVIDIA Corporation
      Version: OpenCL 1.2 CUDA 10.1.120
      Devices:
        GeForce GTX 1080 Ti:
          Timer resolution (ns): 1000
          Vendor: NVIDIA Corporation
          Version: OpenCL 1.2 CUDA
          Driver: 430.64

Installed packages

lib32-opencl-nvidia 1:430.40-1
opencl-headers 2:2.2.20170516-2

Maybe somebody had a similar problem and can help.

...and I hope I provided enough info - if not, please ask!

Aucun commentaire:

Enregistrer un commentaire