jeudi 3 septembre 2020

Why std::tuple breaks small-size struct call convention optimization in C++

C++ has a small-size struct call convention optimization that the compiler treats the small-size struct in function parameters as a primitive type and therefore passes it as efficient as passes a primitive type (say, via registers). For example:

class MyInt { int n; public: MyInt(int x) : n(x){} };
void foo(int);
void foo(MyInt);
void bar1() { foo(1); }
void bar2() { foo(MyInt(1)); }

bar1() and bar2() generate almost identical assembly code expect for calling foo(int) and foo(MyInt) respectively. Specifically on x86_64, it looks like:

        mov     edi, 1
        jmp     foo(MyInt)

But if we test std::tuple<int>, it will be different:

void foo(std::tuple<int>);
void bar3() { foo(std::tuple<int>(1)); }

struct MyIntTuple : std::tuple<int> { using std::tuple<int>::tuple; };
void foo(MyIntTuple);
void bar4() { foo(MyIntTuple(1)); }

The generated assembly code looks totally different, the small-size struct(std::tuple<int>) is passed by pointer:

        sub     rsp, 24
        lea     rdi, [rsp+12]
        mov     DWORD PTR [rsp+12], 1
        call    foo(std::tuple<int>)
        add     rsp, 24
        ret

I dug a bit deeper, tried to make my int a bit more dirty:

class Empty {};
class MyDirtyInt : protected Empty, MyInt {public: using MyInt::MyInt; };
void foo(MyDirtyInt);
void bar5() { foo(MyDirtyInt(1)); }

but the call convention optimization is applied:

        mov     edi, 1
        jmp     foo(MyDirtyInt)

I have tried gcc/clang/msvc, they all showed the same behavior. (godbolt link here) So I guess this must be something in C++ standard(I believe C++ standard doesn't specify any ABI constraint, though)?

I'm aware that the compiler should be able to optimize these out, as long as the definition of foo(std::tuple<int>) is visible and not marked noinline. But I want to know which part of the standard or implementation causes the invalidation of this optimization.

FYI, in case you're curious about what I'm doing on std::tuple, I want to create a wrapper class(i.e. the strong typedef) and don't want to declare comparison operators(operator<==>'s prior to C++20) myself and don't want to bother with boost, so I thought std::tuple was a good base class because everything was there.

Aucun commentaire:

Enregistrer un commentaire