mercredi 19 octobre 2016

I am having trouble understanding how to handle the heisenbug segfault in my OMP program

I am having trouble in a side project I'm doing for a personal Neural Network structure. The issue I am encountering is the Heisenbug Segfault, and it is occurring in a paralleled section of code for a custom Monte Carlo algorithm I am writing.

The threads should not be interacting in any way for this section of the code until they reach the critical section I have defined, but some how, memory locations for local variables in a function call are being overridden by another thread, or the function call itself is overriding the memory position allocated by a previous thread.

I believe this person's problem is the same as the one I am experiencing, but I lack the understanding of how to use his enlightenment to fix my code, since, he did not specify how he fixed his issue. OpenMP Causes Heisenbug Segfault

Here is the parallel section of the code I have written with the "tested" critical add in commented out, since, it did not help with the bug. The section where the bug is occurring is
my_net->train_tim Where tim is short for time. bp or bps is shorthand for blueprint or blueprints

#pragma omp parallel num_threads(num_threads) firstprivate(dist_bin, gen_bin, bps, bins, dur, sf, lr, no, ni)
{
    uint32_t my_rank = omp_get_thread_num();
    uint32_t my_si = my_rank * num_slices;
    uint32_t my_ei;
    if(my_rank == num_threads-1)
        my_ei = bps.size();
    else
        my_ei = my_si + num_slices;
    // Here we need to start the outer for loop, to iterate through every blueprint in bps.
    for(uint32_t i = my_si; i < my_ei; i++){
        // Load the blueprint into an unsigned integer array.
        uint32_t my_nl = bps[i].size();
        uint32_t* my_bp = new uint32_t[my_nl];
        for(uint32_t j = 0; j < my_nl; j++){
            my_bp[j] = bps[i][j];
        }
        // Initialize my network
        //Network* my_net = new Network(lr, my_bp, my_nl);
        // Initialize my error!
        double my_personal_err = 0.0;
        // Loop num_bin times.
        Network* my_net;
        for(uint32_t bc = 0; bc < num_bins; bc+=1){
            uint32_t my_rand_index = dist_bin(gen_bin);
            // We have our random index
            // Create vectors to hold personal examples.
            std::vector<std::vector<double> > ex_inputs;
            std::vector<std::vector<double> > ex_outputs;
            for(unsigned int bin_counter = 0; bin_counter < num_bins; bin_counter++){ // Iterate through each bin.
                // If we are on the bin_id for the test data, skip it!
                if(bin_counter == my_rand_index){
                    continue;
                }
                 // Iterate over all data in this bin
                for(unsigned int data_counter = 0; data_counter < bins[bin_counter].size(); data_counter++){
                    // Allocate space for the inputs and outputs
                    std::vector<double>* ex_ips = new std::vector<double>(ni);
                    std::vector<double>* ex_ops = new std::vector<double>(no);
                    // Iterate over the inputs and outputs, and store them in their proper structures.
                    for(unsigned int ip_counter = 0; ip_counter < ni; ip_counter++){
                        (*ex_ips)[ip_counter] = bins[bin_counter][data_counter][ip_counter];
                    }
                    for(unsigned int op_counter = 0; op_counter < no; op_counter++){
                        (*ex_ops)[op_counter] = bins[bin_counter][data_counter][op_counter+ni];
                    }
                    ex_inputs.push_back(*ex_ips); // Push the inputs and outputs into the examples list.
                    ex_outputs.push_back(*ex_ops);
                }
            }
            // Example inputs and outputs created.
            // Train the network now.
            //#pragma omp critical
            //{
            //  std::cout << "My Rank\t" << my_rank << "\ti\t=\t" << i << "\tbc:\t" << bc << '\n';
            //  std::cout << "[";
            //  for(uint32_t q = 0; q < bps[i].size(); q++){
            //      std::cout << bps[i][q];
            //      if(q != bps[i].size()-1) std::cout << ',';
            //  }
            //  std::cout << "]\n";
            my_net = new Network(lr, my_bp, my_nl);
            my_net->train_tim(ex_inputs, ex_outputs, sf, ex_inputs.size(), dur);
            //}
            // Summ the error!
            my_personal_err += test_network(my_net, bins[my_rand_index], ni, no, sf);
            delete my_net;
        }
        my_personal_err /= num_bins;
        // Now we compare this error with the other error.
        // This is where we need to enter the critical section
        // or the section in which only 1 thread can enter at
        // a time.
        #pragma omp critical
        {
            // if the new error is less than current best
            // then we replace the best's architecture with
            // this architecture.
            if(my_personal_err < best.err){
                delete best.net_ref; // Free up pre-occupied memory.
                my_net = new Network(sf, my_bp, my_nl);
                best.net_ref = my_net; // Store the reference to this network
                delete best.bp; // Free up pre-occupied memory
                best.bp = my_bp; // Store the reference to this array.
                best.nl = my_nl; // Store the number of layers for this bp.
                best.err = my_personal_err; // Store this error.
            }
            else{ // We did not best the current best error, so delete the bp.
                delete my_bp;
            }
        }
    }
}

Now the section of code in train-tim that is causing my program to segfault is the get_weight function called in the following for-loop.

for(unsigned int l = 1; l < num_layers; l++){
    unsigned int s_j = 0;
    for(unsigned int j = 0; j < l; j++) s_j += nodes_per_layer[j];
    unsigned int e_j = s_j + nodes_per_layer[l];
    for(unsigned int j = s_j; j < e_j; j++){
        unsigned int nw = nodes[j]->get_num_weights();
        double *new_weights = new double[nw];
        unsigned int s_i = 0;
        for(unsigned int i = 0; i < l - 1; i++) s_i += nodes_per_layer[i];
        unsigned int e_i = s_i + nodes_per_layer[l-1];
        for(unsigned int i = s_i; i < e_i; i++){
            new_weights[i - s_i] = (nodes[j]->get_weight(i - s_i) + (alpha * nodes[i]->get_value() * nodes[j]->get_delta()));
        }
        nodes[j]->set_weights(new_weights, nw);
        double b = nodes[j]->get_bias();
        double nb = b + (alpha * nodes[j]->get_delta());
        nodes[j]->set_bias(nb);
    }
}

What the code above does is the weight correction section for the Back Propagation training algorithm in my program.

If anyone sees something I do not, or knows what I could potentially do to fix this issue, please let me know!

Here is a sample output from Valgrind Debugger with the Helgrind tool active which describes the problem I believe as well.

==26386== 
==26386== Possible data race during read of size 8 at 0x6213348 by thread #1
==26386== Locks held: none
==26386==    at 0x40CB26: AeroSW::Node::get_weight(unsigned int) (Node.cpp:84)
==26386==    by 0x40E688: AeroSW::Network::train_tim(std::vector<std::vector<double, std::allocator<double> >, std::allocator<std::vector<double, std::allocator<double> > > >, std::vector<std::vector<double, std::allocator<double> >, std::allocator<std::vector<double, std::allocator<double> > > >, double, unsigned int, unsigned long) (Network.cpp:227)
==26386==    by 0x4058F1: monte_carlo(unsigned int, unsigned int, unsigned int, unsigned int, unsigned int, double, double, double, std::vector<double*, std::allocator<double*> >&) [clone ._omp_fn.0] (Validation.cpp:196)
==26386==    by 0x5462E5E: GOMP_parallel (in /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0)
==26386==    by 0x404B86: monte_carlo(unsigned int, unsigned int, unsigned int, unsigned int, unsigned int, double, double, double, std::vector<double*, std::allocator<double*> >&) (Validation.cpp:136)
==26386==    by 0x402467: main (NeuralNetworkArchitectureDriver.cpp:85)
==26386==  Address 0x6213348 is 24 bytes inside a block of size 32 in arena "client"
==26386== 

Aucun commentaire:

Enregistrer un commentaire