lundi 27 mars 2017

Performance of getter methods compared to direct access in c++

In our university I heard the myth, that in the physics department models were often prototyped in python, worked on in c++ and if ready and approved for being shipped rewritten to C or FORTRAN.

So I thought of ways a C++ program is slower then its C counterpart.

I first asked myself the question of the cost of calling a getter-method vs. directly accessing a member of a c(++) objects/structs.

I wrote the most simplest test program to answer my naive need for knowledge ;)

class AccessObj
{
public:
    int a{5};
    AccessObj(){}    

    int getA(){
        return a;
    }
};

int main(int argc, char const *argv[]){
    AccessObj containsA;
    int returnedA{0};

    for (int i = 0; i < 2147483647; ++i){

        #ifndef OOLIKE
            returnedA=containsA.a;
        #else
            returnedA=containsA.getA();
        #endif

    }
    return 0;
}

Being compiled with g++ obj.cpp -o raw -std=c++11 && g++ obj.cpp -o obj -std=c++11 -D OOLIKE with gcc 6.2.0-10 on Debian

The assembly output adding the -S to the compiler flag g++ obj.cpp -o raw.s -std=c++11 -S && g++ obj.cpp -o obj.s -std=c++11 -D OOLIKE -S gives files with 61 lines(raw) vs 84 lines(object) of assembly code.

The loop part being crucial here gave for the direct accessing(./raw):

.L4:
    cmpl    $2147483647, -4(%rbp)
    je  .L3
    movl    -16(%rbp), %eax
    movl    %eax, -8(%rbp)
    addl    $1, -4(%rbp)
    jmp .L4
.L3:

and for the getter approach:

.L6:
    cmpl    $2147483647, -4(%rbp)
    je  .L5
    leaq    -16(%rbp), %rax
    movq    %rax, %rdi
    call    _ZN9AccessObj4getAEv
    movl    %eax, -8(%rbp)
    addl    $1, -4(%rbp)
    jmp .L6
.L5:

with a call to:

.LFB3:
    .cfi_startproc
    pushq   %rbp
    .cfi_def_cfa_offset 16
    .cfi_offset 6, -16
    movq    %rsp, %rbp
    .cfi_def_cfa_register 6
    movq    %rdi, -8(%rbp)
    movq    -8(%rbp), %rax
    movl    (%rax), %eax
    popq    %rbp
    .cfi_def_cfa 7, 8
    ret
    .cfi_endproc

Which leaves the direct access with 6 operations and therewith less then half the number compared to the get method with 15ish+ assembly operations.

Does that mean the direct access is 2.5 times faster? Is there a object-oriented way of being more competitive or will the method due to the call stack of the function always will be way slower?

Thank you for further thoughts!

Aucun commentaire:

Enregistrer un commentaire