I have the following function that calculates the maximum value of 2D/3D arrays in a nested for loop. I used reduction clause to gain some additional speedup however I am not getting a good speedup and I am wondering how to fix this?
Example Function
double maxFunc(double arr2D[]){
double max_val = 0.;
#pragma omp parallel for reduction(max:max_val )
for (int i = 0; i < nx; i++){
for (int j = 0; j < nyk; j++){
if (arr2D[j + nyk*i] > maxVal){
max_val = arr2D[j + nyk*i];
}
}
}
return max_val ;
}
Main Code:
static const int nx = 1024;
static const int ny = 1024;
static const int nyk = ny/2 + 1;
double *Array;
Array = (double*) fftw_malloc(nx*ny*sizeof(double));
for (int i = 0; i < nx; i++){
for (int j = 0; j < ny; j++){
Array[j + ny*i] = //Initialize array to some values;
}
}
//test maxFunc with different number of threads
for (int nThreads =1; nThreads <= 16; nThreads++){
double start_time, run_time;
start_time = omp_get_wtime();
omp_set_num_threads(nThreads);
double max_val= 0.;
#pragma omp parallel for reduction(max:max_val)
for (int i = 0; i < nx; i++){
for (int j = 0; j < nyk; j++){
if (Array[j + nyk*i] > max_val){
max_val= Array[j + nyk*i];
}
}
}
run_time = omp_get_wtime() - start_time;
cout << "Threads: " << nThreads << "Parallel Time in s: " << run_time << "s\n";
}
The output I get looks like:
Threads: 1Parallel Time in s: 0.0003244s
Threads: 2Parallel Time in s: 0.0003887s
Threads: 3Parallel Time in s: 0.0002579s
Threads: 4Parallel Time in s: 0.0001945s
Threads: 5Parallel Time in s: 0.000179s
Threads: 6Parallel Time in s: 0.0001456s
Threads: 7Parallel Time in s: 0.0002081s
Threads: 8Parallel Time in s: 0.000135s
Threads: 9Parallel Time in s: 0.0001262s
Threads: 10Parallel Time in s: 0.0001161s
Threads: 11Parallel Time in s: 0.0001499s
Threads: 12Parallel Time in s: 0.0002939s
Threads: 13Parallel Time in s: 0.0002982s
Threads: 14Parallel Time in s: 0.0002399s
Threads: 15Parallel Time in s: 0.0002283s
Threads: 16Parallel Time in s: 0.0002268s
My PC has 6 cores with 12 logical processors so I sort of expect 6 times speed in best case scenario. Thanks!
Aucun commentaire:
Enregistrer un commentaire