vendredi 30 septembre 2016

Divide Vector into chunks with constant memory

suppose you need to work with great amount of elements (1 bilion+) that are stored in vector and at one moment, you would like to take all of these elements and divide them into groups. To be specific, we would like to do following:

std::vector<std::vector<int>> groups(100, std::vector<int>);
for (size_t i = 0; i < 1000000000; ++i) {
    groups[i % 100].push_back(big_vector.push_back(i));
}
big_vector.resize(0);
big_vector.shrink_to_fit();

However, since big_vector is really massive, it is quite inconvenient to have our data in memory duplicated. This however probably can not be avoided due to vectors continuous memory allocation and inability to resize itself without copying the whole data (Correct me if I am wrong).

The question is then, what other structure to use to store our big data? I thought of writing custom container, that would internally store data in std::vector<std::array<SIZE>>, where SIZE is sufficiently large not to have too many chunks, but not that big to cause problems with duplicate memory overhead. Is there more standard-ish (boost-ish) way of doing this, or is writing custom container my best bet?

To further specify my expectations of the container - it would be nice if it had interface similar to vector (fast random access etc.). However if necessary I would probably could get by without random access and only the ability to push and read things. This though has to be very fast anyway.

Aucun commentaire:

Enregistrer un commentaire