I'm trying to construct a set of unique words from a list of entries, each of which has a vector of strings.
So I made a function called Insert, which gets called for each of the entries like this:
for( auto & e : _Entries )
_Dictionary.Insert( begin( e.getNameWords( ) ), end( e.getNameWords( ) ) );
The class _Dictionary internally has a set (the STL container) and I wrote the function Insert as follows:
template< typename InputIterator >
void Insert( InputIterator first, InputIterator last )
{
for( auto it = first ; it != last ; ++it )
_AllWords.insert( *it );
}
In my case, calling Insert for all entries in _Entries took an average of 570 milliseconds.
Then I thought that I should use the functions that the STL already has to do the same thing that the for loop in Insert does, so I changed the function Insert to the following:
template< typename InputIterator >
void Insert( InputIterator first, InputIterator last )
{
copy( first, last, inserter( _AllWords, begin( _AllWords ) ) );
}
I was expecting this to 1) be more correct, and 2) be at least as fast, if not more (guided by the philosophy of letting the STL do as much for you as you can). However, I was surprised to notice that this implementation actually took longer; not much more, but a consistent 200 milliseconds more than the previous for-loop based implementation.
I know this is an essentially trivial speed difference, but I'm still surprised.
So my question is: why is my implementation faster?
Note: I am compiling this with clang's version 3.5.2 with the libc++ standard library and with the -O3 flag, under Ubuntu 14.04.
Aucun commentaire:
Enregistrer un commentaire