I have a (kind of) performance problem in my code, that roots in the chosen architecture.
I will use multidimensional tensors (basically matrices with more dimensions) in the form of cubes to store my data. Since the dimension is not known at compile-time, I can't use Boost's MultidimensionalArray (IIRC), but have to come up, with my own solution.
Right now, I save each dimension, on it's own. I have a Tensor of dimension (let's say 3), that holds a lot of tensors of dimension 2 (in an std::vector), that each have a std::vector with tensors of dimension 1, that each holds a std::vector of (numerical) data. I use an abstract base-class for my tensor, so everything in there is a pointer to the abstract class, while beeing (secretly) multi- or one-dimensional.
I extract a single numerical data-point by giving a std::list of indices to a tensor, that get's the first element, searches for the according tensor and passes the rest of the list to that tensor in a (kind of) recursive call.
I now have to do a multi-dimensional Fast-Fourier Transformation on that data. I use a Threadpool and Job-Objects, that works on copying data from an Tensor along one dimension, doing an FFT and writes that data back.
I already have logic to implement ThreadPool and organize the dimensions to FFT along, but there is one problem:
My data-structure is the cache-unfriendliest beast, one can think of... While the Data-Copying along the first dimension (that, with it's data in a single 1D-Tensor) is reasonable fast, but in other directions, I need to copy my data from all over the place.
Since there are no race-conditions (I make sure every concurrent FFT is on distinct data-points), I thought, I would not use a Mutex-Guard to let everybody copy at the same time. However this heavily slows down the process ("I copy my data now!" - "No, I copy my data now!"- "But it's my turn now!"...)
Guarding the copy-Process with a mutex, does not increase speed. The FFT of a vector with 1024 elements is way faster, then the copy-process to get these elements, resulting in nearly all of my threads waiting, while one is copying.
Long story short: Is there any kind of multi-dimensional data-structure, that does not need to set the dimension at compile-time, that allows me to traverse fast along all axis? I searched for a while now, by nothing came up besides Boost MultiArray. Vectorization also does not work since the indices would grow too fast to hold in usual int-types.
I can't think of how to present code-examples here, since most of that code is rather simple, but If needed, I can get that in.
Aucun commentaire:
Enregistrer un commentaire