I have a class representing one or several containers of objects. The class offers a function to run a callback for each of the elements. A simple implementation could look like:
struct MyData{
Foo* foo;
void doForAllFoo(std::function<void(Foo)> fct){
for( /* all indices i in foo */){
fct(f[i]);
}
}
}
Driving code:
MyData d = MyData(...);
TypeX param1 = create_some_param();
TypeY param2 = create_some_more_param();
d.doForAll([&](Foo f) {my_function(f, param1, param2);});
I think this is a good solution for flexible callbacks on a container.
Now I'd like to parallelize this with CUDA. I'm not quite sure about what is allowed with lambdas in CUDA and I'm also not sure about compilation for __device__ and __host__.
I can (and will probably have to) change MyData, but I'd like to have no trace of the CUDA background in the driving code, except that I have to allocate memories in a CUDA-accessible way of course.
I think a minimal example would be very helpful.
Aucun commentaire:
Enregistrer un commentaire