dimanche 22 janvier 2017

Find the indices and inverse mapping of a unique vector

I have a std::vector<int> with duplicate values. I can find the unique values using std::unique and std::vector::erase, but how can I efficiently find the vector of indices and construct the original vector given the vector of unique values, through an inverse mapping vector. Allow me to illustrate this using an example:

std::vector<int> vec  = {3, 2, 3, 3, 6, 5, 5, 6, 2, 6};
std::vector<int> uvec = {3, 2, 6, 5}; // vector of unique values
std::vector<int> idx_vec = {0, 1, 4, 5}; // vector of indices
std::vector<int> inv_vec = {0, 1, 0, 0, 2, 3, 3, 2, 1, 2}; // inverse mapping

The inverse mapping vector is such that with its indices one can construct the original vector using the unique vector i.e.

std::vector<int> orig_vec(ivec.size()); // construct the original vector
std::for_each(ivec.begin(), ivec.end(), 
    [&uvec,&inv_vec,&orig_vec](int idx) {orig_vec[idx] = uvec[inv_vec[idx]];});

And the indices vector is simply a vector indices of first occurrence of unique values in the original vector.

My rudimentary solution is far from efficient. It does not use STL algorithms and is O(n^2) at worst.

template <typename T> 
inline std::tuple<std::vector<T>,std::vector<int>,vector<int>>
unique_idx_inv(const std::vector<T> &a) {
    auto size_a = size(a);
    std::vector<T> uniques;
    std::vector<int> idx; // vector of indices
    vector<int> inv(size_a); // vector of inverse mapping

    for (auto i=0; i<size_a; ++i) {
        auto counter = 0;
        for (auto j=0; j<uniques.size(); ++j) {
            if (uniques[j]==a[i]) {
                counter +=1;
                break;
            }
        }
        if (counter==0) {
            uniques.push_back(a[i]);
            idx.push_back(i);
        }
    }

    for (auto i=0; i<size_a; ++i) {
        for (auto j=0; j<uniques.size(); ++j) {
            if (uniques[j]==a[i]) {
                inv[i] = j;
                break;
            }
        }
    }

    return std::make_tuple(uniques,idx,inv);
}

Comparing this with the typical std::sort+std::erase+std::unique approach (which by the way only computes unique values and not indices or inverse), I get the following timing on my laptop with g++ -O3 [for a vector of size=10000 with only one duplicate value]

Find uniques+indices+inverse:                       145ms
Find only uniques using STL's sort+erase+unique     0.48ms

Of course the two approaches are not exactly identical, as the latter one sorts the indices, but still I believe the solution I have posted above can be optimised considerably. Any thoughts how on I can achieve this?

Aucun commentaire:

Enregistrer un commentaire