dimanche 24 novembre 2019

Pytorch Extension: difference of performance between the extension that was compiled by g++ and that was built by setuptools

I wrote a cpp extension for torch which is a custom convolutional function.

Firstly, I compiled this function with g++ directly which was used for testing, the latency is 5 milliseconds.

Secondly, I tried to integrate this function to torch and installed this extension by setuptools, following the steps shown in the tutorial provided by torch. However, the latency is now 16 milliseconds.

The function invokation will consumes about 1-2 ms, so why the performance differs so much?

The compilation by g++ directly was done by

g++ -pthread -mavx2 -mfma ...

and the directives in the source file includes

#pragma GCC diagnostic ignored "-Wformat"

#pragma STDC FP_CONTRACT ON

#pragma GCC optimize("O3","unroll-loops","omit-frame-pointer","inline") //Optimization flags

// #pragma GCC option("arch=native","tune=native","no-zero-upper") //Enable AVX

#pragma GCC target("avx")

These directives were also included in the file built by setuptools. The "setup.py" file is

setup(
    name = 'cusconv_cpp',
    ext_modules=[
        CppExtension(name='cusconv_cpp', sources=['src/cusconv.cpp'],
        extra_compile_args={'cxx': ['-O3', '-pthread', '-mavx2', '-mfma']})
    ],
    cmdclass={
        'build_ext': BuildExtension
    })

The output log by setuptools for buiding is

x86_64-linux-gnu-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fPIC -I/home/max/.local/lib/python3.6/site-packages/torch/lib/include -I/home/max/.local/lib/python3.6/site-packages/torch/lib/include/torch/csrc/api/include -I/home/max/.local/lib/python3.6/site-packages/torch/lib/include/TH -I/home/max/.local/lib/python3.6/site-packages/torch/lib/include/THC -I/usr/include/python3.6m -c src/indconv.cpp -o build/temp.linux-x86_64-3.6/src/indconv.o -O3 -pthread -mavx2 -mfma -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=indconv_cpp -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++11

which indeed includes those flags but many other flags were also used. Anyone has any ideas?

Aucun commentaire:

Enregistrer un commentaire