vendredi 17 novembre 2023

How to Improve XORing of large uint64 arrays?

I want to xor large shifted arrays, following is portable version of that function for easy of explanation. How I can improve this computation? I have tried using AVX2 but didnt see much improvement. Currently for the DB showing in the example below it takes 50ms to process everything, which is 12 GB/s, I will appreciate any tips to improve the computation.

#include <iostream>

uint64_t partition_size = 4096;
uint64_t entry_size = 256; // bits
uint64_t DB_size = 16777216;
uint64_t *DB = new uint64_t[DB_size * entry_size/64];
uint64_t *result = new uint64_t[partition_size];

//partition_index will be a random multiple of partition_size, e.g. 0, 8192, 4096 etc
//random_offset will be a random number in [0, partition_size]
void xor_shifted_arrays(uint32_t partition_index, uint32_t random_offset)
{
    auto uint64_per_entry = entry_size / sizeof(uint64_t);

    int shift_offset;
    uint32_t shift;
    
    for (int i = 0; i < partition_size  ; i = i + 1)
    {
        shift = (i + random_offset) & (partition_size - 1);
        shift_offset = shift * uint64_per_entry;
        
        for (int j = 0; j < uint64_per_entry; j=j+4){
            result[shift_offset + j] = result[shift_offset + j] ^ DB[partition_index + j];  
        }
        partition_index = partition_index + uint64_per_entry;
    }
}

Aucun commentaire:

Enregistrer un commentaire