jeudi 31 mars 2016

Merging vectors without extra memory

I came across this code segment where two vectors are merged where elements from one vector is favored in case of duplication:

std::vector<String> fields1 = fieldSource1.get();
std::vector<String> fields2 = fieldSource2.get();
// original
fields1.insert(std::end(fields1), std::begin(fields2), std::end(fields2));
std::stable_sort(std::begin(fields1), std::end(fields1));
fields.erase(std::unique(std::begin(fields), std::end(fields)), std::end(fields));
return fields1;

Given that Strings are unique in their respective vector, and that order of Strings in output vector is irrelevent, I think that I can make this algorithm more efficient.

I would like to avoid extra memory allocation of std::set_union() and std::set_diff().

(Directly inserting std::set_diff to an original vector is not an option due to iterator invalidation during resizing)

I ended up with this, which is std::set_diff with one iterator replaced with an index:

std::sort(std::begin(fields1), std::end(fields1));
std::sort(std::begin(fields2), std::end(fields2));
// Initialize iterators by index in case of resizing
size_t index = 0;
size_t end = std::size(fields1);
std::remove_copy_if(std::begin(fields2), std::end(fields2), std::back_inserter(fields1),
[&fields1, &index, end](String field)->bool{
    auto begin = std::begin(fields1);
    found = std::lower_bound(begin+index, begin+end, field);
    index = std::distance(begin, found);
    return (*found) == field;
});
return fields1;

My question is: can I make this merge operation more efficient? If not, can I make it more readable?

Aucun commentaire:

Enregistrer un commentaire