I've been measuring the performance of different C/C++ allocation (and initiation) techniques for big chunks of continuous memory. To do so, I tried to allocate (and write to) 100 randomly selected sizes, using uniform distribution and range of 20 to 4096 MB, and measured the time using std::chrono high_resolution_clock
. Each measurement is done by a separate execution of a program, i.e. there should be no memory reuse (at least within the process).
madvise ON refers to calling madvise
with MADV_HUGEPAGE
flag, i.e. enabling transparent huge pages (2MB in case of my systems).
Using a single 16GB module of DDR4 with a clock speed of 2400 MT/s and a data width of 64 bits, I've got a theoretical maximal speed of 17.8 GB/s.
On Ubuntu 18.04.05 LTS (4.15.0-118-generic), memset of the already allocated memory block gets close to the theoretical limit, but the page_aligned allocation + memset is somewhat slower, as expected. New() is very slow, probably due to its internal overhead (values in GB/s):
method madvise mean std
memset madvise OFF 17.3 0.32
page_aligned+memset madvise ON 11.4 0.21
mmap+memset madvise ON 11.3 0.23
new<double>[]() madvise ON 3.2 0.06
Using two modules, I was expecting near to double performance (say 35 GB/s) due to dual-channel, at least for the write operation:
method madvise mean std
memset madvise OFF 28.0 0.23
mmap+memset madvise ON 14.5 0.18
page_aligned+memset madvise ON 14.4 0.17
How you can see, memset reaches only 80% of the theoretical speed. Memory allocation + write speed increases only by 3 GB/s, reaching only 40% of the theoretical speed of the memory.
To make sure that I did not mess up something in the OS (I use it for a few years now), I installed fresh Ubuntu 20.04 (dual boot) and repeated the experiment. The fastest operations were these:
method madvise mean std
memset madvise OFF 29.1 0.86
page_aligned+memset madvise ON 10.5 0.27
mmap+memset madvise ON 10.5 0.31
As you can see, the results are reasonably similar for memset, but actually even worse for allocation + write operations.
Are you aware of a faster way of allocating (and initializing) big chunks of memory? For the record, I have tested combinations of malloc, new<float/double>, calloc, operator new, mmap and page_aligned for allocation, and memset and for loop for writing, together with the madvise flag.
Aucun commentaire:
Enregistrer un commentaire