During implementation of CUDA code I often need some utility functions, which will be called from device and also from host code. So I declare these functions as __host__ __device__. This is OK and possible device/host incompabilities can be handled by #ifdef CUDA_ARCH.
Problems come when the utility function is templated ie. by some functor type. If the template instance calls __host__ function I get this warning:
calling a __host__ function from a __host__ __device__ function is not allowed
Only solution I know is to define the function twice - once for device and once for host code with different name (I cannot overload on __host__ __device__). But this means that there is code duplication and all other __host__ __device__ functions which will call it, must be also defined twice (even more code duplication).
Simplified example:
#include <cuda.h>
#include <iostream>
struct HostObject {
__host__
int value() const { return 42; }
};
struct DeviceObject {
__device__
int value() const { return 3; }
};
template <typename T>
__host__ __device__
int foo(const T &obj) {
return obj.value();
}
/*
template <typename T>
__host__
int foo_host(const T &obj) {
return obj.value();
}
template <typename T>
__device__
int foo_device(const T &obj) {
return obj.value();
}
*/
__global__ void kernel(int *data) {
data[threadIdx.x] = foo(DeviceObject());
}
int main() {
foo(HostObject());
int *data;
cudaMalloc((void**)&data, sizeof(int) * 64);
kernel<<<1, 64>>>(data);
cudaThreadSynchronize();
cudaFree(data);
}
foo_host<> and foo_device<> are possible replacements for the problematic foo<>.
Is there a better solution?
Aucun commentaire:
Enregistrer un commentaire