I have 16000 jobs to perform.
Each job is independent. There is no shared memory, no interprocess communication, no lock or mutex.
I am on ubuntu 16.06. c++11. Intel® Core™ i7-8550U CPU @ 1.80GHz × 8
I use std::async to split jobs between cores.
If I split the jobs into 8 (2000 per core), computation time is 145. If I split the jobs into 4 (4000 per core), computation time is 60.
Output after reduce is the same in both case.
If I monitor the CPU during computation (just using htop), things happen as expected (8 cores are used at 100% in first case, only 4 cores are used 100% in second case).
I am very confused why 4 cores would process much faster than 8.
Aucun commentaire:
Enregistrer un commentaire