I am trying to wrap some C++ code that calls Eigen using pybind11. I am able to successfully compile with EIGEN_USE_MKL_ALL defined. My setup.py script is given by the following:
import os, sys
import numpy as np
from distutils.core import setup, Extension
from distutils import sysconfig
args = []
args += ['-std=c++14','-lstdc++']
args += ['-O3', '-march=native','-fopenmp']
args += ['-DMKL_ILP64', '-m64', '-I${MKLROOT}/include']
args += ['-L${MKLROOT}/lib/intel64', '-Wl,--no-as-needed', '-lmkl_intel_ilp64', '-lmkl_intel_thread', '-lmkl_core', '-liomp5', '-lpthread', '-lm', '-ldl']
ext_modules = [
Extension(
'linear_algebra_utilities',
['linear_algebra_utilities.cpp'],
extra_link_args=args,
extra_compile_args = args,
include_dirs=['pybind11/include','eigen3'],
language='c++14',
),
]
setup(
name='cpputilities',
version='0.0.1',
author='Benjamin Cohen-Stead',
author_email='bwcohenstead@ucdavis.edu',
description='Linear Algebra Utilities.',
ext_modules=ext_modules,
)
This generates the following compilation calls:
running build_ext
building 'linear_algebra_utilities' extension
gcc -pthread -B /home/benwcs/anaconda3/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -Ipybind11/include -Ieigen3 -I/home/benwcs/anaconda3/include/python3.7m -c linear_algebra_utilities.cpp -o build/temp.linux-x86_64-3.7/linear_algebra_utilities.o -std=c++14 -lstdc++ -O3 -march=native -fopenmp -DMKL_ILP64 -m64 -I${MKLROOT}/include -L${MKLROOT}/lib/intel64 -Wl,--no-as-needed -lmkl_intel_ilp64 -lmkl_intel_thread -lmkl_core -liomp5 -lpthread -lm -ldl
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
gcc -pthread -shared -B /home/benwcs/anaconda3/compiler_compat -L/home/benwcs/anaconda3/lib -Wl,-rpath=/home/benwcs/anaconda3/lib -Wl,--no-as-needed -Wl,--sysroot=/ build/temp.linux-x86_64-3.7/linear_algebra_utilities.o -o /home/benwcs/Documents/matrix_stabilization/linear_algebra_utilities.cpython-37m-x86_64-linux-gnu.so -std=c++14 -lstdc++ -O3 -march=native -fopenmp -DMKL_ILP64 -m64 -I${MKLROOT}/include -L${MKLROOT}/lib/intel64 -Wl,--no-as-needed -lmkl_intel_ilp64 -lmkl_intel_thread -lmkl_core -liomp5 -lpthread -lm -ldl
The code compiles the following function which can be called from python and does a QR decomposition:
void QR_to_UdV(const Eigen::Ref<const Eigen::MatrixXd> A,
Eigen::Ref<Eigen::MatrixXd> U,
Eigen::Ref<Eigen::VectorXd> d,
Eigen::Ref<Eigen::MatrixXd> V){
U = A.householderQr().householderQ();
V = A.householderQr().matrixQR().triangularView<Eigen::Upper>();
d = V.diagonal();
V.array().colwise() /= d.array();
}
However, when I time this function against numpy's qr decomposition it is many time slower:
- Eigen+MKL QR: 238ms
- numpy QR: 41ms
I am fairly certain this difference in runtime is because the numpy function is using multithreading but mine is not. I am including the -fopenmp complation flag, so I do not understand why multithreading is not being used in my code, given that I have linked it to MKL.
One final piece of information: I am running this code in Ubuntu on an XPS 15 9570 with a sixth generation i7 processor.
Can anyone show me how to maybe fix my setup.py script so that I can get comparable performance to numpy?
Aucun commentaire:
Enregistrer un commentaire