dimanche 17 septembre 2023

OpenMP: Finding the maximum value of array using reduction clause

I have the following function that calculates the maximum value of 2D/3D arrays in a nested for loop. I used reduction clause to gain some additional speedup however I am not getting a good speedup and I am wondering how to fix this?

Example Function

double maxFunc(double arr2D[]){ 

    double max_val = 0.;
    #pragma omp parallel for reduction(max:max_val ) 
    for (int i = 0; i < nx; i++){
        for (int j = 0; j < nyk; j++){
            if (arr2D[j + nyk*i] > maxVal){
                max_val = arr2D[j + nyk*i];
            }
        }
    }   
    return max_val ;
     
}

Main Code:

static const int nx = 1024;
static const int ny = 1024;
static const int nyk = ny/2 + 1;

double *Array;
Array = (double*) fftw_malloc(nx*ny*sizeof(double)); 

    for (int i = 0; i < nx; i++){
        for (int j = 0; j < ny; j++){
          Array[j + ny*i]  = //Initialize array to some values; 
            
        }
    }

//test maxFunc with different number of threads
for (int nThreads =1; nThreads <= 16; nThreads++){
        double start_time, run_time;
        start_time = omp_get_wtime();
       
        omp_set_num_threads(nThreads);
        double max_val= 0.;
        #pragma omp parallel for reduction(max:max_val) 
        for (int i = 0; i < nx; i++){
            for (int j = 0; j < nyk; j++){
                if (Array[j + nyk*i] > max_val){ 

                    max_val= Array[j + nyk*i];
                }
            }
        }   
        run_time = omp_get_wtime() - start_time;
        cout << "Threads: " << nThreads <<  "Parallel Time in s: " <<  run_time << "s\n";
        
    }

The output I get looks like:

Threads: 1Parallel Time in s: 0.0003244s
Threads: 2Parallel Time in s: 0.0003887s
Threads: 3Parallel Time in s: 0.0002579s
Threads: 4Parallel Time in s: 0.0001945s
Threads: 5Parallel Time in s: 0.000179s
Threads: 6Parallel Time in s: 0.0001456s
Threads: 7Parallel Time in s: 0.0002081s
Threads: 8Parallel Time in s: 0.000135s
Threads: 9Parallel Time in s: 0.0001262s
Threads: 10Parallel Time in s: 0.0001161s
Threads: 11Parallel Time in s: 0.0001499s
Threads: 12Parallel Time in s: 0.0002939s
Threads: 13Parallel Time in s: 0.0002982s
Threads: 14Parallel Time in s: 0.0002399s
Threads: 15Parallel Time in s: 0.0002283s
Threads: 16Parallel Time in s: 0.0002268s

My PC has 6 cores with 12 logical processors so I sort of expect 6 times speed in best case scenario. Thanks!

Aucun commentaire:

Enregistrer un commentaire