mercredi 5 avril 2023

How to optimize boost heaps to outperform std multiset in heap operations? C++

I have worked on implementing the fast marching method. It's a computational method to solve a special type of differential equation. Anyways, to achieve a O(n lg n) running time requires the use of a priority queue supporting get_min(), extract_min(), decrease_key() and insert() (or push()). Moreover, the algorithm in question will use n insert() operations, n extract_min() operations and at worst 4n decrease_key() operations. Now, it would seem to me that heap such as the fibonacci_heap in the boost library would outperform an std set by a lot which supports the same operations (decrease key is implemented by erasing the element and readding it). However, this is not the case and I was wondering why?

(I would like to note that it is not possible to use the std priority_queue, as it does not support decrease_key())

Here is the code for the main loop. Before this loop, the heap is initialized with a few elements in the narrow band i.e. only push operations are used

// FMM loop, the heap represents the state of the narrow band
void loopFMVel() {
    int count = 1;
    while(distances.size() > 0) {
        // extract closest
        struct treedist temp = distances.top();
        int row = temp.row; int column = temp.col;
        distances.pop();
        V[row][column].state = 2;

        // go through neighbours
        for (int k = 0; k < 4; k++) {
            // Valid neighbour check
            if (row + dx[k] >= rows || row + dx[k] < 0 || column + dy[k] >= columns || column + dy[k] < 0) continue;
            // Neighbour handled check
            if (V[row + dx[k]][column + dy[k]].state == 2) continue;
            double d;
            // Update neighbour if new distance closer
            if (V[row + dx[k]][column + dy[k]].state == 1) {
                d = computeDist(row + dx[k], column + dy[k]);
                if (d < (*V[row + dx[k]][column + dy[k]].it).d) {
                    (*V[row + dx[k]][column + dy[k]].it).d = d;
                    (*V[row + dx[k]][column + dy[k]].it).known = (*V[row + dx[k]][column + dy[k]].it).known + 1;
                    V[row + dx[k]][column + dy[k]].d = d;
                    distances.increase(V[row + dx[k]][column + dy[k]].it);
                }
            } else if (V[row + dx[k]][column + dy[k]].state == 0) {
                // Add neighbour to narrow band
                d = computeDist(row + dx[k], column + dy[k]);
                struct treedist t;
                t.d = d; t.row = row + dx[k]; t.col = column + dy[k]; t.ts = count; t.known = 1;
                V[row + dx[k]][column + dy[k]].state = 1;
                V[row + dx[k]][column + dy[k]].d = d;
                V[row + dx[k]][column + dy[k]].it = distances.push(t);
            }
        }
        count++;
    }
    
    std::cout << "Finished with size " << distances.size() << std::endl;
}

The following are some results from my testing:

N = 16001^2 (I ran these tests on an M1 max with flags -Ofast -fno-finite-math-only -march=armv8.5-a -mcpu=native -ffast-math) binary_heap (with reserved memory): 58.35s multiset: 63.33s fibonacci_heap: 73.43 16_ary_heap (with reserved memory): 65.40s pairing_heap: 116.5s

I profiled using time. If any more details are required, I will gladly add them.

Aucun commentaire:

Enregistrer un commentaire