I have a working CPU-based implementation of a simple deep learning framework where the main components are nodes of a computation graph which can perform computations on tensors.
Now I need to extend my implementation to GPU, I would like to use the existing class structure and only extend its functionality to GPU however, I'm not sure if that's even possible.
Most of the classes have methods that work on and return tensors such as:
tensor_ptr get_output();
where tensor_ptr
is simply std::shared_ptr
pointer of my tensor class. Now what I would like to do is to add a GPU version for each such method. The idea that I had in mind was to define a struct
in a separate file tensor_gpu.cuh
as follows
struct cu_shape {
int n_dims;
int x,y,z;
int len;
};
struct cu_tensor {
__device__ float * array;
cu_shape shape;
};
and then the previous function would be mirrored by:
cu_tensor cu_get_output();
The problem seems to be that the .cuh
file gets treated as a regular header file and is compiled by the default c++ compiler and gives error:
error: attribute "device" does not apply here
on the line with the definition of __device__ float * array
.
I am aware that you cannot mix CUDA and pure C++ code so I planned to hide all the CUDA runtime api functions into .cu
files which would be defined in .h
files. The problem is that I wanted to store the device pointers within my class and then pass those to the CUDA-calling functions.
This way I could still use all of my existing object structure and only modify the initialization and computation parts.
If a regular c++ class cannot touch anything with __device__
flag then how can you even integrate CUDA code into C++ code?
Can you only use CUDA runtime calls and keywords literally just in .cu
files?
Or is there some smart way to hide the fact from c++ compiler that it is dealing with CUDA pointers?
Any insight is deeply appreciated!
Aucun commentaire:
Enregistrer un commentaire