mercredi 1 mars 2017

Traverse of multidimensional Array in any axis

I have a (kind of) performance problem in my code, that roots in the chosen architecture.

I will use multidimensional tensors (basically matrices with more dimensions) in the form of cubes to store my data. Since the dimension is not known at compile-time, I can't use Boost's MultidimensionalArray (IIRC), but have to come up, with my own solution.

Right now, I save each dimension, on it's own. I have a Tensor of dimension (let's say 3), that holds a lot of tensors of dimension 2 (in an std::vector), that each have a std::vector with tensors of dimension 1, that each holds a std::vector of (numerical) data. I use an abstract base-class for my tensor, so everything in there is a pointer to the abstract class, while beeing (secretly) multi- or one-dimensional.

I extract a single numerical data-point by giving a std::list of indices to a tensor, that get's the first element, searches for the according tensor and passes the rest of the list to that tensor in a (kind of) recursive call.

I now have to do a multi-dimensional Fast-Fourier Transformation on that data. I use a Threadpool and Job-Objects, that works on copying data from an Tensor along one dimension, doing an FFT and writes that data back.

I already have logic to implement ThreadPool and organize the dimensions to FFT along, but there is one problem:

My data-structure is the cache-unfriendliest beast, one can think of... While the Data-Copying along the first dimension (that, with it's data in a single 1D-Tensor) is reasonable fast, but in other directions, I need to copy my data from all over the place.

Since there are no race-conditions (I make sure every concurrent FFT is on distinct data-points), I thought, I would not use a Mutex-Guard to let everybody copy at the same time. However this heavily slows down the process ("I copy my data now!" - "No, I copy my data now!"- "But it's my turn now!"...)

Guarding the copy-Process with a mutex, does not increase speed. The FFT of a vector with 1024 elements is way faster, then the copy-process to get these elements, resulting in nearly all of my threads waiting, while one is copying.

Long story short: Is there any kind of multi-dimensional data-structure, that does not need to set the dimension at compile-time, that allows me to traverse fast along all axis? I searched for a while now, by nothing came up besides Boost MultiArray. Vectorization also does not work since the indices would grow too fast to hold in usual int-types.


I can't think of how to present code-examples here, since most of that code is rather simple, but If needed, I can get that in.

Aucun commentaire:

Enregistrer un commentaire