mercredi 3 février 2016

Why does Microsoft std::vector::insert use rotate()?

I was doing some experiments with lists vs vectors, and I noticed that the Microsoft implementation of std::vector is doing the following for .insert:

iterator insert(const_iterator _Where, _Ty&& _Val)
    {   // insert by moving _Val at _Where
    return (emplace(_Where, _STD move(_Val)));
    }

    iterator emplace(const_iterator _Where \
        COMMA LIST(_TYPE_REFREF_ARG)) \
    {   /* insert by moving _Val at _Where */ \
    size_type _Off = _VIPTR(_Where) - this->_Myfirst; \
    _VECTOR_EMPLACE_CHECK \
    emplace_back(LIST(_FORWARD_ARG)); \
    _STD rotate(begin() + _Off, end() - 1, end()); \
    return (begin() + _Off); \
    }

First it inserts the element at the back of the vector, and then calls rotate and swaps the last elements with an element starting at begin+off, until we reach the end. From what I understand, it does this:

vec: 01234, insert(2, 5)

vec: 012345

vec: 012345 -> 015342

vec: 015342 -> 015243

vec: 015243 -> 015234

This is not friendly with your cache, because it always swaps with the last element when it does that rotate. I did some benchmarks, where I held the element in a temporary and used that to swap elements and it was faster: This was it:

push_back(value); //My vector doesn't have resize/grow implemented
T tmp = *(end() - 1);
while(new_location != end())
{
    std::swap(tmp, *new_location);
    new_location++;
}

The full code and tests are here: http://ift.tt/1o6S5RP

First question:

Why does it do that rotate instead of the second version of insert that I presented here?

The second version is a more cache-friendly alternative compared with the first one. For large vectors swapping with the last element in the vector introduces a time penalty due to the cache.

Is it in order to avoid storing another temporary?

Second question:

Why doesn't it just memmove the elements one position to the right?

Is there a standard requirement that forces you to swap elements instead of calling memmove? It's interesting that for POD there is not a special template specialization that just memmoves stuff around. In any case I am more interested in why rotate and not a more cache friendly alternative is used.

In my tests this is even faster than the previous two versions.

Tests are done like this:

0) for i = 0 to count

1) Pick a random position in a vector

2) Touch each element from 0 to that position (force a read on it)

3) Call insert after we reach the position

Compiled with Visual Studio 2012 x86, /O2.

For count = 100 000, element size = 4 bytes:

std::vector:                7.5 seconds   
std::list:                 19.6 seconds                            
MyVector:                   3.2 seconds                              
MyVector using memmove:     2.1 seconds

For count = 200 000, element size = 4 bytes:
std::vector:                30.3 seconds                          
std::list:                  45.5 seconds                          
MyVector:                   13.1 seconds
MyVector using memmove:      8.7 seconds   

For count = 20 000, element size = 128 bytes:
std::vector:                5.36 seconds
std::list:                  1.37 seconds
MyVector:                   5.12 seconds
MyVector (memmove)          1.68 seconds

I know this is not a real life thing that you would do, these were some experiments that I did in order to show that cache matters, and I accidentally discovered the way that std vector insert works.

Also I know MyVector is a bad implementation of a vector. I just quickly wrote it in order to test my assumptions for insert. I just want to discuss the insert() implementation, not Vector class design :).

Thanks for reading this

Aucun commentaire:

Enregistrer un commentaire