dimanche 6 août 2023

MPI_Bcast returns EXIT CODE: 139 on large char arrays

I would appreciate, if you could help me troubleshoot the following situation.

I broadcast (on localhost) a char array in 2 consecutive steps:

  1. MPI_Bcast the size of the array

  2. MPI_Bcast the array itself

This is done via dynamic process spawning. The data communication works well until the size of the array exceeds (roughly) the number of 8375000 elements. That is 8.375Mb of data, which seems quite small according to available documentation. As far as I read in other posts, MPI supports up to 2^31 elements. After I exceed the 8375000 elements, I receive an MPI error with EXIT CODE: 139.

For that I test the code on valgrind. The summary does not indicate something worrisome, but I receive various MPI related errors starting with Syscall param writev(vector[...]) points to uninitialised byte(s). Herewith the tail of the log.

...
==15125== Syscall param writev(vector[...]) points to uninitialised byte(s)
==15125==    at 0x5B83327: writev (writev.c:26)
==15125==    by 0x8978FF1: MPL_large_writev (in /usr/lib/libmpi.so.12.1.6)
==15125==    by 0x8961D4B: MPID_nem_tcp_iStartContigMsg (in /usr/lib/libmpi.so.12.1.6)
==15125==    by 0x8939E15: MPIDI_CH3_RndvSend (in /usr/lib/libmpi.so.12.1.6)
==15125==    by 0x895DD69: MPID_nem_lmt_RndvSend (in /usr/lib/libmpi.so.12.1.6)
==15125==    by 0x8945FE9: MPID_Send (in /usr/lib/libmpi.so.12.1.6)
==15125==    by 0x88B1D84: MPIC_Send (in /usr/lib/libmpi.so.12.1.6)
==15125==    by 0x886EC08: MPIR_Bcast_inter_remote_send_local_bcast (in /usr/lib/libmpi.so.12.1.6)
==15125==    by 0x87C28F2: MPIR_Bcast_impl (in /usr/lib/libmpi.so.12.1.6)
==15125==    by 0x87C3183: PMPI_Bcast (in /usr/lib/libmpi.so.12.1.6)
==15125==    by 0x50B9CF5: QuanticBoost::Calculators::Exposures::Mpi::dynamic_mpi_master(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, QuanticBoost::WorkflowContext&) (in /home/ubuntu/Documents/code/quanticboostnew-build/release/lib/libCppLib.so)
==15125==    by 0x5140CBA: QuanticBoost::MpiExposureSpawnTask::execute(QuanticBoost::WorkflowContext&) (in /home/ubuntu/Documents/code/quanticboostnew-build/release/lib/libCppLib.so)
==15125==  Address 0x1ffefff524 is on thread 1's stack
==15125==  in frame #3, created by MPIDI_CH3_RndvSend (???:)
==15125==  Uninitialised value was created by a stack allocation
==15125==    at 0x8939D70: MPIDI_CH3_RndvSend (in /usr/lib/libmpi.so.12.1.6)
==15125== 
==15125== 
==15125== HEAP SUMMARY:
==15125==     in use at exit: 184 bytes in 6 blocks
==15125==   total heap usage: 364,503 allocs, 364,497 frees, 204,665,377 bytes allocated
==15125== 
==15125== LEAK SUMMARY:
==15125==    definitely lost: 0 bytes in 0 blocks
==15125==    indirectly lost: 0 bytes in 0 blocks
==15125==      possibly lost: 0 bytes in 0 blocks
==15125==    still reachable: 184 bytes in 6 blocks
==15125==         suppressed: 0 bytes in 0 blocks
==15125== Reachable blocks (those to which a pointer was found) are not shown.
==15125== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==15125== 
==15125== For counts of detected and suppressed errors, rerun with: -v
==15125== ERROR SUMMARY: 15 errors from 10 contexts (suppressed: 0 from 0)

Could you help me identifying the vagrind errors and solve the MPI failure with code 139?


Below I share minimum code snippets from the master and worker code, as well as the output with the error code.

Code snippet (master):

std::cout << "Spawning "<< dynamic_procs << " " << worker_path.string() <<std::endl;

MPI_Comm_spawn(
                worker_path.string().c_str(),
                MPI_ARGV_NULL,
                dynamic_procs,
                info,
                0,
                MPI_COMM_SELF, //intra-communication
                &intercomm,  //inter-communication
                MPI_ERRCODES_IGNORE);

        
std::cout << "\n________________ MASTER: MPI spawning starts _________________ \n" << std::endl;

// I normally send the size of the char array in the 1st Bcast
// and the array itself in a 2nd Bcast
//
// but MPI starts failing somewhere beyond 8.375e6 elements 
// though I expect that happening after 2^31 array elements, or not???

//I test the limits of the array size by overriding manually
int in_str_len=8.375e6; //Until this size it all works
//int in_str_len=8.376e6; //This does NOT work
//int in_str_len=8.3765e6; //This does NOT work and so on
      
MPI_Bcast(
                &in_str_len,    //void* data,
                1,              //int count,
                MPI_INT,        //MPI_Datatype datatype,
                MPI_ROOT,       //int use MPI_ROOT not own set root!
                intercomm       //MPI_Comm communicator
        );

//Initialize a test buffer      
std::string s (in_str_len, 'x');        //It works
//char d[in_str_len+1];                 //It works

/*
 * The 2nd MPI_Bcast will send the data to all nodes 
 */
MPI_Bcast(
                s.data(),       //void* data,
                in_str_len,     //int count,
                MPI_BYTE,       //MPI_Datatype datatype, MPI_BYTE,MPI_CHAR work
                MPI_ROOT,       //int use MPI_ROOT not own set root!
                intercomm       //MPI_Comm communicator
               );

Code snippet (worker):


std::cout << "I am in a spawned process " << rank << "/" << dynamic_procs
          << " from host " << name << std::endl;

int in_str_len;

//Receive stream size;
MPI_Bcast(
    &in_str_len,    //void* data,
    1,              //int count,
    MPI_INT,        //MPI_Datatype datatype,
    0,              //int root,
    parent          //MPI_Comm communication with parent (not MPI_COMM_WORLD)
    );

std::cout << "1st MPI_Bcast received len: "<< in_str_len * 1e-6<<"Mb" << std::endl;
MPI_Barrier(MPI_COMM_WORLD); //Tested with and without the barrier

char data[in_str_len+1];

std::cout << "Create char array for 2nd MPI_Bcast with length: "<< in_str_len << std::endl;
    
MPI_Bcast(
    data,           //void* data,
    in_str_len,     //int count,
    MPI_BYTE,       //MPI_Datatype datatype,
    0,              //int root,
    parent          //MPI_Comm communication with parent (not MPI_COMM_WORLD)
    );

std::cout << "2nd MPI_Bcast received data: " << sizeof(data) << std::endl;

Error received with the large array:


Spawning 3 /home/ubuntu/Documents/code/build/release/bin/mpi_worker

________________ MASTER: MPI spawning starts _________________ 

I am in a spawned process 1/3 from host ip-172-31-30-254
I am in a spawned process 0/3 from host ip-172-31-30-254
I am in a spawned process 2/3 from host ip-172-31-30-254

1st MPI_Bcast received len: 8.3765Mb
1st MPI_Bcast received len: 8.3765Mb
1st MPI_Bcast received len: 8.3765Mb

Create char array for 2nd MPI_Bcast with length: 8376500
Create char array for 2nd MPI_Bcast with length: 8376500
Create char array for 2nd MPI_Bcast with length: 8376500

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   PID 9690 RUNNING AT localhost
=   EXIT CODE: 139
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================

PS: Let me know if you need any extra info or further editing on my post.

Aucun commentaire:

Enregistrer un commentaire