I have an application that is multithreaded and needs to take a lot of random numbers in [0,1)
Even though I am taking a lot of samples, I am also doing some expensive work, but after profiling generate_canonical results being around 66% of each execution. I must assume I am doing something wrong.
Currently I have only one instance of
std::uniform_real_distribution<float> distr;
std::mt19937 twister;
that are accessed by all threads. I have recently discovered that this is not thread safe and will fix it up soon, but this shouldn't create performance problem, right?
I initialize the above only once with
distr(0.0f, 1.0f)
std::random_device rd;
twister = std::mt19937(rd());
twister.discard(100);
And then simply get the random sample as
distr(twister);
via a static wrapper of the distr+twister.
The profiler highlights the operator() of the generator as the main bottleneck. Is it really this expensive? Am I doing something wrong here? If not, is there an alternative that provides good random results and it's faster?
To give better numbers, in my profiling capture I have roughly 2/3 of the samples in generate_canonical, total running time is 1:50 min on a Xeon @ 3.1Ghz w/ 20 cores and I draw around 300 millions samples in total. Are those numbers to be expected?
Aucun commentaire:
Enregistrer un commentaire