Openmp outperforms the serial code by factor x2, but I would like to have a better performance if it is possible.
Here is the serial code in c++:
for (int k = 0; k < numelem[i]; k++)
{
sumK = sumK - weight[k];
int cmax = 0;
cmax = max(capacity - sumK, weight[k]);
for (int c = capacity; c >= cmax; c--)
{
if (f[c] < f[c - weight[k]] + value[k])
{
f[c] = f[c - weight[k]] + value[k];
M[capacity * k + c] = 1;
}
}
}
For the openmp version, I use two f0,f1 arrays which are swapped at each iteration. This helps me to prevent the race condition, but I suppose that false sharing is still present (not sure). Other my supposition is that, the conditional statements inside pragma for slow down the execution.
#pragma omp parallel
{
for (int k = 0; k < numelem[i]; k++) {
sumK = sumK - weight[k];
int cmax = 0;
cmax = max(capacity - sumK, weight[k]);
int c = capacity;
if (k % 2 == 0) {
#pragma omp for
for (c = capacity; c >= cmax; c--) {
//FALSE SHARING???
if (f0[c] < f0[c - weight[k]] + value[k]) {
f1[c] = f0[c - weight[k]] + value[k];
M[capacity * k + c] = 1;
} else {
f1[c] = f0[c];
}
}
}
else {
#pragma omp for
for (c = capacity; c >= cmax; c--) {
//FALSE SHARING???
if (f1[c] < f1[c - weight[k]] + value[k]) {
f0[c] = f1[c - weight[k]] + value[k];
M[capacity * k + c] = 1;
} else {
f0[c] = f1[c];
}
}
}
}
}
Here you can find the full code for serial c++ and openmp c++
This work is based on this article:
- Solving knapsack problems on GPU by V. Boyera, D. El Baza, M. Elkihel
- related work: Accelerating the knapsack problem on GPUs by Bharath Suri
Aucun commentaire:
Enregistrer un commentaire