jeudi 29 janvier 2015

convenient vs. efficient way of sending nested std::vector with MPI

I'm wondering about two different ways of sending nested std::vector<std::vector<T>> between processes with MPI.


Scenario


A class has two different nested STL vectors. Let's say storing positions and velocities of multi-dimensional particles (yes, there is a reason I'm separating those in that class). The first index is the particle index, the second the spacial dimension (e.g. positions[3][1] is the y-coordinate of the 4th particle and velocities[3][2] is the velocity along the z-axis of the same particle).


At some point I need to communicate both, positions and velocities, between processes, preferably without the need of additional buffers, i.e. the class members are the MPI buffers for send and receive (the processes have a linear mapping, i.e. 0->1->2->3->4->... and the values 1 is receiving are stored in another location as those 1 is sending).


Once the whole program has started the number of particles (N) and spacial dimension (D) (same for all particles) is known and constant.


Options




  1. Derived Datatype for vector<vector<T>>


    Create (not sure yet how) a MPI derived datatype which represents the memory locations of the raw vector data (i.e. positions[:].data() and velocities[:].data()). That way I don't have to pack (i.e. copy) and unpack (i.e. copy again) the values before and after communication.


    As the location of the sending and receiving values usually changes between two subsequent communications between the same two processes - to my understanding - the derived datatype needs to be recreated before each communication.




  2. Pack & Unpack


    Before sending, create a temporary linear POD array (of size N*D*2) with the flattened and concatenated positions and velocities. After communicating this, unpack the received flattened POD array and copy the values into their desired nested vectors.


    This sums up to N*D*4 copies per communication although does not require any created of derived datatypes and is easy to implement and read.




I'm tempted to go with the second option for its convenience. However, as N is expected to become very large in production, this is anything but efficient. Thus, I'm wondering how one would realize the first option without overcomplicating the whole thing.


Side Note: I'm using C++11 and boost::mpi is possible.


Aucun commentaire:

Enregistrer un commentaire