I am trying to implement space efficient Trie data structure as mentioned in the below paper.( completion Trie )
Space-Efficient Data Structures for Top-k Completion
With above reference I did following steps while making data structure.
1) I created Trie tree ( without compressed ) with 1000 words.
2) Then I traversed tree by level order and created few vectors.
-
vector< bitset<5> > noOfChildren;
How many children particular node has. A node can have ( 0 to 26 children) so it can be represented with 5 bits. -
vector< bitset<5> > charsInBits;
The character of a particular node is converted into 5 bits. -
vector< int> startPosition
It is a prefix sum of vector noOfChildren. Which gives the Index of 1st child of a particular node. -
vector<bitset<1> > leafNode
It stores whether that node is leaf node or not. 0 for no leaf and 1 for leaf.Now I stored all these vectors data in different files.(.txt)
( Above procedure was one time procedure. )
Now when I run the program, I fetch all the data from (.txt) files into respective vectors.
Then I process all these vectors and successfully traverse through Trie tree.
Few questions I have:
1) Is this a compact Trie representation as I have not stored Trie node address in memory ?
2) The files in which I have stored the above vectors data is giving me more size
than my actual 1000 words file size. So I am confused , whether I have compressed Trie or not ?
3) For more than 1000 words my program gives me "Segmentation fault". It does not even create Trie tree. Why this is happening ?
Note : I have implemented above logic in program and my program is perfectly working
for 1000 words.
I have not implemented exactly what was said in above research paper.
Aucun commentaire:
Enregistrer un commentaire