I'm working on Cuda with C++11 (I don't think Cuda supports later C++ versions yet). I've a closure object that is passed to the function Process()
which calls the closure for each iteration.
I understand that std::
functionality is generally not available in Cuda. For example, when I try to use std::function< float(uint32_t) >
, I get this error:
error: calling a host function("std::function ::function< ::, void, void> ") from a global function("_NV_ANON_NAMESPACE::LargeKernel") is not allowed
What can I replace lookupFunc
with so that this compiles without std::function
being available? I was able to work around this by creating a function template to deduce the type of the lambda function.
This code works and shows the work around I've employed:
//using lookupFunc = std::function< float(uint32_t) >;
template< typename Lambda > // Work around with function template
__device__
void Process(float * const outData,
const int32_t locationX,
const Lambda /* lookupFunc */ lambda)
{
float answer = 0.f;
for( int32_t offset = -1 ; ++offset < 1024 ; )
{
const float value = lambda( offset );
answer += value;
}
outData[ locationX ] = answer;
}
__global__
void LargeKernel(const float * const inData,
float * const outData)
{
constexpr uint32_t cellStride = 1;
const int32_t locationX = threadIdx.x + blockDim.x * blockIdx.x;
const auto lambda
= [locationX, inData, cellStride](const int32_t offset)
{
return inData[ locationX + offset + cellStride ];
};
Process( outData, locationX, lambda );
}
I also tried:
using lookupFunc = float(* const)(uint32_t);
But that gives error:
error: no suitable conversion function from "const lambda ->float" to "float (*)(uint32_t)" exists
How can I declare the type of the third argument to Process()
without using a template?
Aucun commentaire:
Enregistrer un commentaire