This question is related to:
c++ std::async : faster on 4 cores compared to 8 cores
In the previous question, I was wondering why some code would run faster on 4 cores rather than 8 (answer: my cpu had 4 cores and 8 threads)
Now I am discovering that code is even faster on a single core.
I am on ubuntu 16.06. c++11. Intel® Core™ i7-8550U CPU @ 1.80GHz × 8
Here code for benchmarking computation time against number of core used
#include <math.h>
#include <future>
#include <ctime>
#include <vector>
#include <iostream>
#define NB_JOBS 2000.0
#define MAX_CORES 8
// no special meaning to this function,
// just uses some CPU
static bool _expensive(int nb_jobs){
for(int job=0;job<nb_jobs;job++){
float x = 0.6;
bool b = true;
double f = 1;
for(int i=0;i<1000;i++){
if(!b) f=-1;
for(double j=1;j<2.0;j+=0.01) x+= f* pow(1.0/sin(x),j);
b = !b;
}
}
return true;
}
static double _duration(int nb_cores){
std::clock_t begin = clock();
int nb_jobs_per_core = rint ( NB_JOBS / (float)nb_cores );
std::vector < std::future<bool> > futures;
for(int i=0;i<nb_cores;i++){
futures.push_back( std::async(std::launch::async,_expensive,nb_jobs_per_core));
}
for (auto &e: futures) {
bool foo = e.get();
}
std::clock_t end = clock();
double duration = double(end - begin) / CLOCKS_PER_SEC;
return duration;
}
int main(){
for(int nb_cores=1 ; nb_cores<=MAX_CORES ; nb_cores++){
double duration = _duration(nb_cores);
std::cout << nb_cores << ": " << duration << "\n";
}
return 0;
}
Here the output:
1: 8.55817
2: 8.76621
3: 7.90191
4: 8.4656
5: 10.5494
6: 11.6175
7: 21.697
8: 24.3621
using cores seems to have marginal impacts. Note: "htop" shows usage of virtual cores as expected by the program, i.e. first one core used at 100%, then 2, ..., and at the end 8.
If I replace:
futures.push_back( std::async(std::launch::async,[...]
by :
futures.push_back( std::async(std::launch::async|std::launch::deferred,[...]
then I get:
1: 8.6459
2: 8.69905
3: 10.7763
4: 11.4505
5: 11.8426
6: 10.4282
7: 9.55181
8: 9.05565
and htop shows only 1 virtual core being used 100% during the full duration.
Anything I am doing wrong ?
note: I tried on several desktops, all with various specs (nb of core and nb of threads), and observed something similar.
Aucun commentaire:
Enregistrer un commentaire