mercredi 23 mai 2018

C++ Integer Trie implementation using a hash_map to reduce memory consumption

I have to implement a Trie of codes of a given fixed-length. Each code is a sequence of integers and considering that some patterns are usual, I decided to implement a Trie in order to store all the codes. I also need to iterate throught the codes given they lexicograph order and I'm expecting to work with millions (maybe billions) of codes.

This is why I considered implementing this particular Trie as a dictionary where each key is the index of a given prefix. Let's say key 0 has a list of his prefix children and for each one i save the corresponding entry on the dictionary... Example: If my first insertion is the code 231, then the dictionary would look like:

[0]->{(2,1)}
[1]->{(3,2)}
[2]->{(1,3)}

This way, if my second insertion would be 243, the dictionary would be updated this way:

[0]->{(2,1)}
[1]->{(3,2),(4,3)} *Here each list is sorted using a flat_map
[2]->{(1,endMark)}
[3]->{(3,endMark)}

My problem is that I have been using a vector for this purpuse and because having all the dictionary in contiguos memory allows me to have a better performance while iterating over it. Now, when I need to work with BIG instances of my problem, due to resizing the vector I cannot work with millions of codes (memory consuption could be as much as 200GB). Now I have tried google's sparse hash insted of the vector and my question is, do you have any suggestion? any other alternative in mind? Is there any other way to work with integers as keys to improve performance? I know I wont have any collision because each key would be different from the rest.

Best regards, Quentin

Aucun commentaire:

Enregistrer un commentaire