mardi 10 septembre 2019

Common buffer allocation for efficient OpenMP and OpenCL performance

I am building a library which will simultaneously perform different compute intensive operations on a vector on the CPU side and GPU side using OpenMP and OpenCL. The problem is when I override a vector's allocator for proper alignment(starting address -multiple of 4096 and 64 byte cache alignment) to enable zero-copy, the OpenMP performance suffers as the vector stops being optimized for sse and avx instructions. Hence, How to write a custom allocator for a stl vector such that it can be utilised both by OpenMP/ SSE/AVX2 for CPU side work and OpenCL / zero-copy for GPU side work ?

Aucun commentaire:

Enregistrer un commentaire