c++11: Why copying STL implementation doesn't have the same execution time?

jeudi 23 février 2017

Why copying STL implementation doesn't have the same execution time?

Recently I've been studying the STL implementation of std::sort and std::stable_sort, but I stumbled upon a weird behavior. I copied libc++ sort() implementation and all the helper functions and ran a benchmark on it using hayai - GitHub.

I compared the copied STL implementation to a custom merge_sort implementation. To my surprise, the merge_sort algorithm had a better execution time:

c_merge_sort : 14788 µs
c_stl_sort   : 20355 µs

Immediately I knew something was wrong, cause I knew for a fact I could further improve my own implementation of merge_sort. So I ran a second benchmark comparing the copied implementation to the std::sort itself.

c_stl_sort : 22369 µs
std::sort  :  5478 µs

I double checked my code and everything was identical to the library implementation. I then decided to just edit the Library file Xcode 8.0 uses, specifically the /include/algorithm header file. I simply copied one of the sort() functions definitions and renamed the function as follows:

template <class _RandomAccessIterator>
inline _LIBCPP_INLINE_VISIBILITY
void sort(_RandomAccessIterator __first, _RandomAccessIterator __last) {
    _VSTD::sort(__first, __last, __less<typename iterator_traits<_RandomAccessIterator>::value_type>());
}

template <class _RandomAccessIterator>
inline _LIBCPP_INLINE_VISIBILITY
void kappa(_RandomAccessIterator __first, _RandomAccessIterator __last) {
    _VSTD::sort(__first, __last, __less<typename iterator_traits<_RandomAccessIterator>::value_type>());
}

Benchmark on std::kappa and std::sort:

std::kappa : 21031 µs
std::sort  :  5859 µs

This just blew my mind, even editing the Library file itself and just renaming the function caused the algorithm to execute differently. I tried with and without inline and the result was the same.

Could someone help me understand this behavior?

Relevant Information:

I'm using the default Apple LLVM 8.0 compiler.
I used a vector container on both instances.
The vector had the same elements, but were different memory blocks. So one execution didn't interfere with the other.

c++11

jeudi 23 février 2017

Why copying STL implementation doesn't have the same execution time?

Aucun commentaire:

Enregistrer un commentaire