mardi 21 mai 2019

Why isn't clang eliding copy in this function with auto return type?

I've found a case where clang 8.x doesn't elide a copy of a templated class object that gcc and msvc have no trouble with. In my actual application, this superfluous copy is quite expensive, so I am trying to get to the bottom of this, and end up with a better understanding of when copy elision is and isn't performed in C++17.

The problem is shown in the code snippet below. A function declared with auto return type that returns a named class object has an extra copy construction in its body. If the return is recoded to return an unnamed temporary, the elision occurs. If the function is recoded to explicitly return an instance of the class (instead of auto) the elision occurs.

If struct A has no template parameter then fully-elided code is also generated.

The problem shows whether or not everything is noexcept or allowed to inline (NOINLINE is so you can see the problem in Godbolt without having to execute the code).

// compiled with -O2 -std=c++17
#if defined(_MSC_VER) && !defined(__clang__)
#define NOINLINE __declspec(noinline)
#else
#define NOINLINE __attribute__((noinline))
#endif

template<int P>
struct A {
  int data = 0;
  NOINLINE explicit A(int data_) noexcept : data(data_) { }
  NOINLINE ~A() noexcept { }
  NOINLINE A(const A& other) noexcept : data(other.data) { }
};


template <int P>
NOINLINE auto return_auto_A_nrvo(const A<P>& a) noexcept {
/* clang 6.0 thru 8.0 doesn't elide copy of 'result': 
   gcc and msvc elide the copy as expected.
        mov     r14, rsp
        mov     rdi, r14
        call    A<0>::A(A<0> const&)
        mov     rdi, rbx
        mov     rsi, r14
        call    A<0>::A(A<0> const&)
        mov     rdi, r14
        call    A<0>::~A() [base object destructor]

* return A<P>(a); is fully optimized
*/
  A<P> result(a);
  return result;
}

template <int P>
NOINLINE A<P> return_A_nrvo(const A<P>& a) noexcept {
// NRVO with explicit return type: fully optimized
  A<P> result(a);
  return result;
}

template <int P>
NOINLINE auto return_auto_A_rvo(const A<P>& a) noexcept {
// RVO: fully optimized
  return A<P>(a);
}

NOINLINE int main() {
  auto a1 = A<1>(42);
  auto a2 = return_auto_A_nrvo(a1);
  auto a3 = return_A_nrvo(a1);
  auto a4 = return_auto_A_rvo(a1);

  return a2.data + a3.data + a4.data;
}

The comments in the function return_auto_A_nrvo() show the code generated by clang with an un-elided copy. The other variants all generate fully-elided code. The copy is also elided if class A has no template parameters.

This Godbolt link shows the code generated by GCC, clang, and msvc: https://www.godbolt.org/z/FDAvQO.

Perhaps this is just a bug/missed optimization opportunity that clang misses and Brands G and M do not. If that's the case, I'll try to find the appropriate place to post this for the clang folks to fix. But I feel there may be something deeper going on here, such as a fundamental difference between returning auto and returning a templated class object. I believe that C++17 guarantees that unnamed-RVO will always occur but that named-RVO as in my case is not guaranteed -- I would like to understand why that is the case (and why it applies here).

Aucun commentaire:

Enregistrer un commentaire