mercredi 27 décembre 2017

c++ std async : almost no effect to spawn over several cores

This question is related to:

c++ std::async : faster on 4 cores compared to 8 cores

In the previous question, I was wondering why some code would run faster on 4 cores rather than 8 (answer: my cpu had 4 cores and 8 threads)

Now I am discovering that code is even faster on a single core.

I am on ubuntu 16.06. c++11. Intel® Core™ i7-8550U CPU @ 1.80GHz × 8

Here code for benchmarking computation time against number of core used

#include <math.h>
#include <future>
#include <ctime>
#include <vector>
#include <iostream>

#define NB_JOBS 2000.0
#define MAX_CORES 8

// no special meaning to this function, 
// just uses some CPU
static bool _expensive(int nb_jobs){
  for(int job=0;job<nb_jobs;job++){
    float x = 0.6;
    bool b = true;
    double f = 1;
    for(int i=0;i<1000;i++){
      if(!b) f=-1;
      for(double j=1;j<2.0;j+=0.01) x+= f* pow(1.0/sin(x),j);
      b = !b;
    }
  }
  return true;
}

static double _duration(int nb_cores){

  std::clock_t begin = clock();

  int nb_jobs_per_core = rint ( NB_JOBS / (float)nb_cores );

  std::vector < std::future<bool> > futures;
  for(int i=0;i<nb_cores;i++){
    futures.push_back( std::async(std::launch::async,_expensive,nb_jobs_per_core));
  }
  for (auto &e: futures) {
    bool foo = e.get();
  }

  std::clock_t end = clock();

  double duration = double(end - begin) / CLOCKS_PER_SEC;
  return duration;

}


int main(){

  for(int nb_cores=1 ; nb_cores<=MAX_CORES ; nb_cores++){

    double duration = _duration(nb_cores);
    std::cout << nb_cores << ": " << duration << "\n";

  }

  return 0;

}

Here the output:

1: 8.55817
2: 8.76621
3: 7.90191
4: 8.4656
5: 10.5494
6: 11.6175
7: 21.697
8: 24.3621

using cores seems to have marginal impacts. Note: "htop" shows usage of virtual cores as expected by the program, i.e. first one core used at 100%, then 2, ..., and at the end 8.

If I replace:

futures.push_back( std::async(std::launch::async,[...]

by :

futures.push_back( std::async(std::launch::async|std::launch::deferred,[...]

then I get:

1: 8.6459
2: 8.69905
3: 10.7763
4: 11.4505
5: 11.8426
6: 10.4282
7: 9.55181
8: 9.05565

and htop shows only 1 virtual core being used 100% during the full duration.

Anything I am doing wrong ?

note: I tried on several desktops, all with various specs (nb of core and nb of threads), and observed something similar.

Aucun commentaire:

Enregistrer un commentaire