I have read https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-bar which details about PTX synchronization function.
-
It says there are 16 "barrier logical resource", and you can specify which barrier to use with the parameter "a". What is a barrier logical resource?
-
I have a piece of code from an outside source, which I know works. However, I cannot understand the syntax used inside "asm" and what "memory" does. I assume "name" replaces "%0" and "numThreads" replace "%1", but what is "memory" and what are the colons doing?
__device__ __forceinline__ void namedBarrierSync(int name, int numThreads) { asm volatile("bar.sync %0, %1;" : : "r"(name), "r"(numThreads) : "memory");} -
In a block of 256 threads, I only want threads 64 ~ 127 to synchronize. Is this possible with
barrier.syncfunction? ( for an example, say I have a grid of 1 block, block of 256 threads. we split the block into 3 conditional branches s.t. threads 0 ~ 63 go into kernel1, threads 64 ~ 127 go into kernel 2, and threads 128 ~ 255 go into kernel 3. I want threads in kernel 2 to only synchronize among themselves. So if I use the "namedBarrierSync" function defied above: "namedBarrierSync( 1, 64)". Then does it synchronize only threads 64 ~ 127, or threads 0 ~ 63? -
I have tested with below code ( assume that gpuAssert is an error checking function defined somewhere in the file ):
for some reason, I can't format the code correctly, so I took a screenshot.
Aucun commentaire:
Enregistrer un commentaire