lundi 29 mai 2017

OpenMP: writing data synchronized with HDF5

I currently have a project going on, in which a large dataset has to be created using HDF5. Now, the naive implementation is all nice and dandy, but very slow. The slow part is in the calculation (10x slower than write) which I cannot speed up anymore, but maybe parallelization is possible.

I guess I could use a simple #pragma omp parallel for but the dataspace.write(..) method should be squential for speed reasons (maybe it doesnt matter). See this picture for example.

------------NAIVE IMPLEMENTATION-----------------
|----------PARALLEL IMPLEMENTATION--------------|
------------DIFFERENT IMPLEMENTATION-------------
i.e.: Queuesize=4

T          Thread
<calcn---> Calculation time
<Wn>       Write data n. Order *important*
.          Waiting


#include <chrono>
#include <cmath>
#include <iostream>
#include <memory>

double calculate(float *buf, const struct options *opts) {
  // dummy function just to get a time reference
  double res = 0;
  for (size_t i = 0; i < 10000; i++)
    res += std::sin(i);
  return 1 / (1 + res);

struct options {
  size_t idx[6];

class Dataspace {
  void selectHyperslab(){}; // selects region in disk space
  void write(float *buf){}; // write buf to selected disk space

int main() {
  size_t N = 6;
  size_t dims[6] = {4 * N, 4 * N, 4 * N, 4 * N, 4 * N, 4 * N},
         buf_offs[6] = {4, 4, 4, 4, 4, 4};
  // dims: size of each dimension, multiple of 4
  // buf_offs: size of buffer in each dimension

  // Calcuate buffer size and allocate
  // the size of the buffer is usually around 1Mb
  // and not a float but a compund datatype
  size_t buf_size = buf_offs[0];
  for (auto off : buf_offs)
    buf_size *= off;
  std::unique_ptr<float[]> buf{new float[buf_size]};

  struct options opts;        // options parameters, passed to calculation fun
  struct Dataspace dataspace; // dummy Dataspace. Supplied by HDF5

  size_t i = 0;
  size_t idx0, idx1, idx2, idx3, idx4, idx5;
  auto t_start = std::chrono::high_resolution_clock::now();
  std::cout << "[START]" << std::endl;
  for (idx0 = 0; idx0 < dims[0]; idx0 += buf_offs[0])
    for (idx1 = 0; idx1 < dims[1]; idx1 += buf_offs[1])
      for (idx2 = 0; idx2 < dims[2]; idx2 += buf_offs[2])
        for (idx3 = 0; idx3 < dims[3]; idx3 += buf_offs[3])
          for (idx4 = 0; idx4 < dims[4]; idx4 += buf_offs[4])
            for (idx5 = 0; idx5 < dims[5]; idx5 += buf_offs[5]) {
              opts.idx[0] = idx0;
              opts.idx[1] = idx1;
              opts.idx[2] = idx2;
              opts.idx[3] = idx3;
              opts.idx[4] = idx4;
              opts.idx[5] = idx5;

              dataspace.selectHyperslab(/**/); // function from HDF5
              calculate(buf.get(), &opts);     // populate buf with data
              dataspace.write(buf.get());      // has to be sequential
  std::cout << "[DONE] " << i << " calls" << std::endl;
  std::chrono::duration<double> diff =
      std::chrono::high_resolution_clock::now() - t_start;
  std::cout << "Time: " << diff.count() << std::endl;
  return 0;

Code should work right out of the box.

I already took a quick look into OpenMP, but I can't wrap my head around yet. Can anyone give me a hint/working example? I am not good with parallelization, but wouldn't a writer-thread with a bufferqueue work? Or is using OpenMP overkill anyways and pthreads suffice? Any help is kindly appreciated,


Aucun commentaire:

Enregistrer un commentaire