lundi 27 février 2017

Fastest ways to check if a value already exists in a stl container

I am holding a very big list of memory addresses (around 400.000) and need to check if a certain address already exists in it 400.000 times a second.

A code example to illustrate my setup:

std::set<uintptr_t> existingAddresses; // this one contains 400.000 entries

while (true) {
    // a new list with possible new addresses
    std::set<uintptr_t> newAddresses; // also contains about ~400.000 entries

    // in my own code, these represent a new address list
    for (auto newAddress : newAddresses) {

        // already processed this address, skip it
        if (existingAddresses.find(newAddress) != existingAddresses.end()) {
          continue;
        }

        // we didn't have this address yet, so process it.
        SomeHeavyTask(newAddress);

        // so we don't process it again
        existingAddresses.emplace(newAddress);
    }

    Sleep(1000);
}

This is the first implementation I came up with and I think it can be greatly improved.

Next I came up with using some custom indexing strategy, also used in databases. The idea is to take a part of the value and use that to index it in its own group set. If I would take for example the last two numbers of the address I would have 16^2 = 256 groups to put addresses in.

So I would end up with a map like this:

[FF] -> all address ending with `FF`
[EF] -> all addresses ending with `EF`
[00] -> all addresses ending with `00`
// etc...

With this I will only need to do a lookup on ~360 entries in the corresponding set. Resulting in ~360 lookups being done 400.000 times a second. Much better!

I am wondering if there are any other tricks or better ways to do this? My goal is to make this address lookup as FAST as possible.

Aucun commentaire:

Enregistrer un commentaire