jeudi 5 avril 2018

Prefix Tree of std:string codes with undefined behaviour (segmentation faults)

I am trying to implement a prefix tree of a list of integers encoded with std::string. The following would be an example of such codes:

2-2-20200
2-20-2020
2-20000-2
2-200002
2-2020-20
2-220-200
20-2-2002
20-200-20
20-20020
20-2200-2
200-2-200
200-2200
2000-2-22

It is important to have all the nodes within the same memory block (or blocks) because I 'll be performing several look-ups and I would like to take advantage of the cache. This is why I decided to implement the tree by using an std::vector. This implementation is based on Chandan Mittal's implementation of PrefixTree for strings (see: https://github.com/calmhandtitan/algorepo/blob/master/stringTheory/trie.cpp).

Here is my (not working) implementation:

    struct node
{
    bool isEnd = false;
    int child [11] = {};
    node() = default;
};

class PrefixTree{
private:
    std::vector<node> chunk;
    int index;
    int getPosition(char val){
        switch(val){
            case '0':
                return 0;
            case '1':
                return 1;
            case '2':
                return 2;
            case '3':
                return 3;
            case '4':
                return 4;
            case '5':
                return 5;
            case '6':
                return 6;
            case '7':
                return 7;
            case '8':
                return 8;
            case '9':
                return 9;
            case '-':
                return 10;
            default:
                throw std::invalid_argument("Invalid Argument");
        }
    }
public:
    int getIndex() const {
        return index;
    }

    PrefixTree(){
        chunk.resize(1);
        index = 0;
    }
    ~PrefixTree(){
        chunk.clear();
        chunk.shrink_to_fit();
    }

    void insert(const std::string& word)
    {
        int aux = 0;
        for (std::string::size_type i = 0; word[i] != '\0'; ++i) {
            int letter = getPosition(word[i]);
            if(chunk[aux].child[letter] == 0) {
                index++;
                chunk.resize(static_cast<unsigned long>(index));
                chunk[aux].child[letter] = index;
            }
            aux = index;
        }
        std::cout << index << std::endl;
        chunk[aux].isEnd = true;
    }
    bool search(const std::string& word)
    {
        int aux = 0;
        for (std::string::size_type i = 0; word[i] != '\0'; ++i) {
            int letter = getPosition(word[i]);
            if(chunk[aux].child[letter] == 0){
                return false;
            }
            aux = chunk[aux].child[letter];
        }
        return chunk[aux].isEnd;
    }

    bool is_prefix(const std::string& prefix)
    {
        int aux = 0;
        for (std::string::size_type i = 0; prefix[i] != '\0'; ++i) {
            int letter = getPosition(i);
            if(chunk[aux].child[letter] == 0){
                return false;
            }else
                aux = chunk[aux].child[letter];
        }
        return true;
    }
};

Once I perform several insertions in order to search right after for the inserted elements I got segmentation faults or incorrect results.

You can verify this by running the code with the above examples. Could you tell me how to fix this problem? Any advise regarding the implementation? or about an efficient way to do it?

Thanks in advance!

Aucun commentaire:

Enregistrer un commentaire