This question is related to this: Optimal uint8_t bitmap into a 8 x 32bit SIMD "bool" vector
I would like to create an optimal function with this signature:
_256 PackLeft(_m256 inputVector, _m256 boolVector);
The desired behaviour is that on an input like this:
inputVector = {42, 17, 13, 3}
boolVector = {true, false, true, false}
It masks all values that have false
in the boolVector
and then repacks the values that remain to the left. On the output above, the return value should be:
{42, 13, , }
An obvious way to do this is the use _mm_movemask_epi8
to get a 8 byte int out of the bool vector, look up the shuffle mask in a table and then do a shuffle with the mask.
However, I would like to avoid a lookup table if possible. Is there a faster solution?
Aucun commentaire:
Enregistrer un commentaire