I have problems measuring the time of a simple OpenCL kernel on Arch Linux (Manjaro). OpenCL's means to measure time give me an time of 0 seconds or give an error, depending on the conditions.
My problem
When executing a passthough kernel (basically out[i] = in[i];
) on a big NDRange
, the CPU measures the correct execution time, while OpenCL reports 0. Here is the console output:
CPU: 1.27449 s
GPU: 0 s
passthrough.cpp (excerpts)
#include <CL/cl2.hpp>
#include "../Timer.hpp"
// start timer
Timer t;
// enqueue blocking
cl::Event processingEvent;
cl_int err = device->_queue.enqueueNDRangeKernel(
passthrough, // kernel
cl::NullRange, // offset
cl::NDRange(bufferSize), // global range
{}, // local range
nullptr, // wait for events
&processingEvent // processing event
);
processingEvent.wait();
// measure time
double cpuUsec = t.deltaUsec();
double gpuUsec = Timer::deltaUsec(processingEvent);
// debug
cout << "CPU: " << cpuUsec / 1000000.0 << " s\n";
cout << "GPU: " << gpuUsec / 1000000.0 << " s\n";
Timer.cpp (excerpts)
#include <chrono>
#include <ratio>
#include <CL/cl2.hpp>
Timer::Timer() {
from = high_resolution_clock::now();
}
double Timer::deltaUsec() {
return duration_cast<duration<double, std::ratio<1, 1000000>>>(high_resolution_clock::now() - from).count();
}
double Timer::deltaUsec(const cl::Event &event) {
return (double) Timer::deltaNsec(event) / 1000.0;
}
// main function for the timing
unsigned long Timer::deltaNsec(const cl::Event &event) {
auto end = event.getProfilingInfo<CL_PROFILING_COMMAND_END>();
auto start = event.getProfilingInfo<CL_PROFILING_COMMAND_START>();
return end - start;
}
What I have tried
- I double checked the input values to be equal to the output - so the kernel actually works.
- I deactivated compiler optimizations - just in case they cause the GPU to use the same buffer for in- and output values.
- I added some logic so the code acutally needs more time to execute.
- I checked in LLDB for the
start
andend
variables inTimer::deltaNsec()
not to be 0. They aren't and increase everytime I check. - After adding
#define CL_HPP_ENABLE_EXCEPTIONS
before#import cl2.hpp
I got the following SIGABRT error:
terminate called after throwing an instance of 'cl::Error'
what(): clGetEventProfileInfo
LLDB hinted me to the line auto end = event.getProfilingInfo<CL_PROFILING_COMMAND_END>();
in Timer.cpp
which I think is really confusing.
According to this answer I used the timer correctly - did I?
Some more info on my system / environment
Used OpenCL driver (excerpts from custom YAML):
NVIDIA CUDA:
Vendor: NVIDIA Corporation
Version: OpenCL 1.2 CUDA 10.1.120
Devices:
GeForce GTX 1080 Ti:
Timer resolution (ns): 1000
Vendor: NVIDIA Corporation
Version: OpenCL 1.2 CUDA
Driver: 430.64
Installed packages
lib32-opencl-nvidia 1:430.40-1
opencl-headers 2:2.2.20170516-2
Maybe somebody had a similar problem and can help.
...and I hope I provided enough info - if not, please ask!
Aucun commentaire:
Enregistrer un commentaire