jeudi 3 septembre 2015

Why can't Clang optimise away std::initializer_list?

Why can't Clang 3.6 generate the same assembly for versions a and b of the following program?

#include <iostream>
#include <algorithm>

int main(int argc, char** argv)
{
    std::cout << std::max(3, argc) << std::endl; // a
    //std::cout << std::max({3, argc}) << std::endl; // b

    return 0;
}

Compiled with -O3, the relevant bits of assembly for a and b respectively are:

##a
    cmpl    $2, %edi
    movl    $3, %esi
    cmovgl  %edi, %esi
    movq    __ZNSt3__14coutE@GOTPCREL(%rip), %rdi

##b
    movq    ___stack_chk_guard@GOTPCREL(%rip), %r15
    movq    (%r15), %r15
    movq    %r15, -32(%rbp)
    leaq    -40(%rbp), %rcx
    movl    $3, -40(%rbp)
    leaq    -36(%rbp), %rax
    movl    %edi, -36(%rbp)
    movl    $3, %esi
    leaq    -32(%rbp), %r8
    movq    %rcx, %rdx
    jmp LBB0_1
    .align  4, 0x90
LBB0_2:
    movl    (%rbx), %edi
    movq    %rax, %rcx
    movq    %rbx, %rax
LBB0_1:
    cmpl    %edi, %esi
    cmovlq  %rax, %rdx
    movq    %rcx, %rbx
    addq    $8, %rbx
    cmpq    %r8, %rbx
    movl    (%rdx), %esi
    jne LBB0_2
    movq    __ZNSt3__14coutE@GOTPCREL(%rip), %rdi

From my limited knowledge of assembly, it looks like version b has compiled to a std::max_element algorithm, which is clearly less efficient than the algorithm used in a. Why is this? What sort of optimisations can we realistically expect when using std::initializer_list?

Aucun commentaire:

Enregistrer un commentaire