mardi 10 octobre 2017

generate_canonical with mersenne twister as bottleneck, aternatives?

I have an application that is multithreaded and needs to take a lot of random numbers in [0,1)

Even though I am taking a lot of samples, I am also doing some expensive work, but after profiling generate_canonical results being around 66% of each execution. I must assume I am doing something wrong.

Currently I have only one instance of

std::uniform_real_distribution<float>       distr;
std::mt19937                                twister;

that are accessed by all threads. I have recently discovered that this is not thread safe and will fix it up soon, but this shouldn't create performance problem, right?

I initialize the above only once with

distr(0.0f, 1.0f)
std::random_device rd;
twister = std::mt19937(rd());
twister.discard(100); 

And then simply get the random sample as

distr(twister);

via a static wrapper of the distr+twister.

The profiler highlights the operator() of the generator as the main bottleneck. Is it really this expensive? Am I doing something wrong here? If not, is there an alternative that provides good random results and it's faster?

To give better numbers, in my profiling capture I have roughly 2/3 of the samples in generate_canonical, total running time is 1:50 min on a Xeon @ 3.1Ghz w/ 20 cores and I draw around 300 millions samples in total. Are those numbers to be expected?

Aucun commentaire:

Enregistrer un commentaire