jeudi 23 février 2017

Why copying STL implementation doesn't have the same execution time?

Recently I've been studying the STL implementation of std::sort and std::stable_sort, but I stumbled upon a weird behavior. I copied libc++ sort() implementation and all the helper functions and ran a benchmark on it using hayai - GitHub.

I compared the copied STL implementation to a custom merge_sort implementation. To my surprise, the merge_sort algorithm had a better execution time:

c_merge_sort : 14788 µs
c_stl_sort   : 20355 µs

Immediately I knew something was wrong, cause I knew for a fact I could further improve my own implementation of merge_sort. So I ran a second benchmark comparing the copied implementation to the std::sort itself.

c_stl_sort : 22369 µs
std::sort  :  5478 µs

I double checked my code and everything was identical to the library implementation. I then decided to just edit the Library file Xcode 8.0 uses, specifically the /include/algorithm header file. I simply copied one of the sort() functions definitions and renamed the function as follows:

template <class _RandomAccessIterator>
inline _LIBCPP_INLINE_VISIBILITY
void sort(_RandomAccessIterator __first, _RandomAccessIterator __last) {
    _VSTD::sort(__first, __last, __less<typename iterator_traits<_RandomAccessIterator>::value_type>());
}

template <class _RandomAccessIterator>
inline _LIBCPP_INLINE_VISIBILITY
void kappa(_RandomAccessIterator __first, _RandomAccessIterator __last) {
    _VSTD::sort(__first, __last, __less<typename iterator_traits<_RandomAccessIterator>::value_type>());
}

Benchmark on std::kappa and std::sort:

std::kappa : 21031 µs
std::sort  :  5859 µs

This just blew my mind, even editing the Library file itself and just renaming the function caused the algorithm to execute differently. I tried with and without inline and the result was the same.

Could someone help me understand this behavior?

Relevant Information:

I'm using the default Apple LLVM 8.0 compiler.
I used a vector container on both instances.
The vector had the same elements, but were different memory blocks. So one execution didn't interfere with the other.

Aucun commentaire:

Enregistrer un commentaire