jeudi 25 février 2016

Why does C++ multiplication with dynamic array work better than std::vector version

I am implementing the C++ multiplication for matrices with different data structures and techniques (vectors , arrays and OpenMP) and I found a strange situation... My dynamic array version is working better:

times:

openmp mult_1: time: 5.882000 s array mult_2: time: 1.478000 s

My compilation flags are:

/usr/bin/g++ -fopenmp -pthread -std=c++1y -O3

C++ vector version

    typedef std::vector<std::vector<float>> matrix_f;
    void mult_1 (const matrix_f &  matrixOne, const matrix_f & matrixTwo, matrix_f & result) {
    const int matrixSize = (int)result.size();
    #pragma omp parallel for simd
    for (int rowResult = 0; rowResult < matrixSize; ++rowResult) {
        for (int colResult = 0; colResult < matrixSize; ++colResult) {
            for (int k = 0; k < matrixSize; ++k) {
                result[rowResult][colResult] += matrixOne[rowResult][k] * matrixTwo[k][colResult];  
            }
        }
    }
}

Dynamic array version

void mult_2 ( float *  matrixOne, float * matrixTwo,  float * result, int size)  {
    for (int row = 0; row < size; ++row) {
        for (int col = 0; col < size; ++col) {
            for (int k = 0; k < size; ++k) {
                (*(result+(size*row)+col)) += (*(matrixOne+(size*row)+k)) * (*(matrixTwo+(size*k)+col));
            }
        }
    }
}

tests:

C++ vector version

utils::ChronoTimer timer;
/* set Up simple matrix */
utils::matrix::matrix_f matr1 = std::vector<std::vector<float>>(size,std::vector<float>(size));
fillRandomMatrix(matr1);

utils::matrix::matrix_f matr2 = std::vector<std::vector<float>>(size,std::vector<float>(size));
fillRandomMatrix(matr2);

utils::matrix::matrix_f result = std::vector<std::vector<float>>(size,std::vector<float>(size));    
timer.init();
utils::matrix::mult_1(matr1,matr2,result);
std::printf("openmp mult_1: time: %f ms\n",timer.now() / 1000);

Dynamic array version

utils::ChronoTimer timer;

float *p_matr1 = new float[size*size];
float *p_matr2 = new float[size*size];
float *p_result = new float[size*size];

fillRandomMatrixArray(p_matr1,size);
fillRandomMatrixArray(p_matr2,size);

timer.init();
utils::matrix::mult_2(p_matr1,p_matr2,p_result,size);
std::printf("array mult_2: time: %f ms\n",timer.now() / 1000);

delete [] p_matr1;
delete [] p_matr2;
delete [] p_result;

I was checking some previous posts, but I couldn't find any related with my problem link, link2, link3:

Aucun commentaire:

Enregistrer un commentaire