I was messing around on http://ift.tt/1GF9raf when I observed something peculiar. Consider the following function:
#include <algorithm>
#include <cstdlib>
#include <functional>
float dot(float src1[], float src2[], int size) {
float* vecmul = static_cast<float*>(malloc(size * sizeof (float)));
float dotprod = 0;
std::transform(src1, src1+size, src2, vecmul, std::multiplies<float>());
dotprod = std::accumulate(vecmul, vecmul+size, 0);
free(vecmul);
return dotprod;
}
With flags -O3 -std=c++11
on x86 gcc 4.9.2
this gets compiled down to:
dot(float*, float*, int):
// load args, do multiplication from std::transform (with mulss)
.L22:
pxor %xmm0, %xmm0
addq $4, %rcx
cvtsi2ss %edx, %xmm0 *
addss -4(%rcx), %xmm0 *
cmpq %rcx, %rsi *
cvttss2si %xmm0, %edx *
jne .L22
pxor %xmm0, %xmm0
cvtsi2ss %edx, %xmm0
.L4:
// pop arguments, free, etc.
I'm curious as to why we have the float-to-int conversion, then an int addition, and then a conversion back (asterisked lines). Why would this be faster than a direct fadd
?
Aucun commentaire:
Enregistrer un commentaire