In Cuda you can specify template parameters that are used to automatically create completely different versions of kernels. The catch is that you can only pass const values to the functions so that the compiler knows ahead of time exactly which versions of the kernel need to be created. For instance, you can have a template parameter int X, then use an if(X==4){this}else{that} and you'll get two separate functions created, neither of which have the overhead of the 'if' statement.
I've found this to be invaluable in allowing great flexibility and code re-usability without sacrificing performance.
Bonus points if you can point out that branches don't have that much overhead, I never knew that! ;)
Aucun commentaire:
Enregistrer un commentaire