I've decided to parallelize a huge program I wrote and eventually I came across the new C++11 smart pointers.
I had a routine that should be executed many times (usually above 1000 times) that is somewhat expensive. It was ran in a dead simple for loop, what I did was to install this for loop in a method which would be ran by some worker threads.
Did it so, made some parameters wrapped by a std::shared_ptr
, cared for race conditions, everything seemed fine.
But now, some times, process will be aborted and I am given one of the following errors:
Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)
or
malloc.c:2395: sysmalloc: Assertion `(old_top == initial_top (av) && old_size == 0) || ((unsigned long) (old_size) >= MINSIZE && prev_inuse (old_top) && ((unsigned long) old_end & (pagesize - 1)) == 0)' failed.
All of these errors occur while the parallel for is ongoing; not before, not right in the beginning, not in the end, but somewhere in between, which smells me like a race condition not covered.
The program is huge, but I created a miniature of it that is able to reproduce the problem:
#include <iostream>
#include <vector>
#include <unordered_map>
#include <thread>
#include <set>
#include <memory>
#include <atomic>
namespace std {
template <>
struct hash<std::multiset<unsigned long>>
{
std::size_t operator()(const std::multiset<unsigned long>& k) const
{
std::size_t r = 0;
bool shift = false;
for (auto&& it : k) {
r = (r >> !shift) ^ (std::hash<unsigned long>()(it) << shift);
shift = !shift;
}
return r;
}
};
}
typedef std::unordered_map<std::multiset<unsigned long>, int*> graphmap;
std::multiset<unsigned long>* bar(int pos) {
std::multiset<unsigned long> *label = new std::multiset<unsigned long>;
label->insert(pos%5);
label->insert(pos%2);
return label;
}
void foo(std::shared_ptr<graphmap> &kSubgraphs, int pos) {
int *v = (*kSubgraphs)[*bar(pos)];
if(v == nullptr) {
v = new int[pos+1]();
v[0]++;
} else {
v[pos]++;
}
}
void worker(std::shared_ptr<std::atomic_int> *counter, int n, std::shared_ptr<graphmap> *kSubgraphs)
{
for (int pos = (*counter)->fetch_add(1); pos <= n; pos = (*counter)->fetch_add(1)) {
if (pos%100==0) std::cout << pos << std::endl;
foo(*kSubgraphs, pos);
}
}
int main() {
int n = 1000;
std::vector<std::thread> threads;
std::shared_ptr<graphmap> kSubgraphs = std::make_shared<graphmap>();
std::shared_ptr<std::atomic_int> counter = std::make_shared<std::atomic_int>(0);
for (int i=0; i<5; i++) {
foo(kSubgraphs, n);
}
for (int i=0; i<4; i++) {
threads.push_back(std::thread(worker, &counter, n, &kSubgraphs));
}
for(auto& th : threads) th.join();
return 0;
}
This code basically mimics the behavior of the original code, which is to use an unordered_map keyed by a multiset with values that are pointers to int arrays. First some keys are inserted and the array initialized (maybe the problem is caused by the way I am initializing it?) and finally the worker threads run to update a unique position of the array of an entry of the unordered_map.
Two threads may access the same map entry simultaneously, but they will never write to the same index of the array at the same time.
Different from the original code, this code won't throw any errors when running over internet compilers such as ideone.com, I also tried to run it from CLion ide and no errors will occur (maybe it can occur errors if one tries enough times), but I got similar error as the original when running from the command line multiple times. I compiled it with:
g++ -std=c++11 -pthread -o test.exe test.cpp
And after running several times it eventually gives this error:
*** Error in `./test.exe': double free or corruption (fasttop): 0x00000000006a2d30 ***
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x77725)[0x7fccc9d4f725]
/lib/x86_64-linux-gnu/libc.so.6(+0x7ff4a)[0x7fccc9d57f4a]
/lib/x86_64-linux-gnu/libc.so.6(cfree+0x4c)[0x7fccc9d5babc]
./test.exe[0x404e9e]
./test.exe[0x40431b]
./test.exe[0x4045ed]
./test.exe[0x407c6c]
./test.exe[0x4078d6]
./test.exe[0x40742a]
./test.exe[0x40869e]
./test.exe[0x4086be]
./test.exe[0x4085dd]
./test.exe[0x40842d]
./test.exe[0x4023a2]
./test.exe[0x401d55]
./test.exe[0x401c4a]
./test.exe[0x401c66]
./test.exe[0x401702]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)[0x7fccc9cf8830]
./test.exe[0x401199]
======= Memory map: ========
00400000-0040f000 r-xp 00000000 08:05 12202697 /home/rodrigo/test.exe
0060e000-0060f000 r--p 0000e000 08:05 12202697 /home/rodrigo/test.exe
0060f000-00610000 rw-p 0000f000 08:05 12202697 /home/rodrigo/test.exe
00691000-006c3000 rw-p 00000000 00:00 0 [heap]
7fcca8000000-7fcca8089000 rw-p 00000000 00:00 0
7fcca8089000-7fccac000000 ---p 00000000 00:00 0
7fccb0000000-7fccb008b000 rw-p 00000000 00:00 0
7fccb008b000-7fccb4000000 ---p 00000000 00:00 0
7fccb8000000-7fccb8089000 rw-p 00000000 00:00 0
7fccb8089000-7fccbc000000 ---p 00000000 00:00 0
7fccc0000000-7fccc007c000 rw-p 00000000 00:00 0
7fccc007c000-7fccc4000000 ---p 00000000 00:00 0
7fccc79cb000-7fccc79cc000 ---p 00000000 00:00 0
7fccc79cc000-7fccc81cc000 rw-p 00000000 00:00 0
7fccc81cc000-7fccc81cd000 ---p 00000000 00:00 0
7fccc81cd000-7fccc89cd000 rw-p 00000000 00:00 0
7fccc89cd000-7fccc89ce000 ---p 00000000 00:00 0
7fccc89ce000-7fccc91ce000 rw-p 00000000 00:00 0
7fccc91ce000-7fccc91cf000 ---p 00000000 00:00 0
7fccc91cf000-7fccc99cf000 rw-p 00000000 00:00 0
7fccc99cf000-7fccc9ad7000 r-xp 00000000 08:05 24126366 /lib/x86_64-linux-gnu/libm-2.23.so
7fccc9ad7000-7fccc9cd6000 ---p 00108000 08:05 24126366 /lib/x86_64-linux-gnu/libm-2.23.so
7fccc9cd6000-7fccc9cd7000 r--p 00107000 08:05 24126366 /lib/x86_64-linux-gnu/libm-2.23.so
7fccc9cd7000-7fccc9cd8000 rw-p 00108000 08:05 24126366 /lib/x86_64-linux-gnu/libm-2.23.so
7fccc9cd8000-7fccc9e98000 r-xp 00000000 08:05 24126374 /lib/x86_64-linux-gnu/libc-2.23.so
7fccc9e98000-7fccca097000 ---p 001c0000 08:05 24126374 /lib/x86_64-linux-gnu/libc-2.23.so
7fccca097000-7fccca09b000 r--p 001bf000 08:05 24126374 /lib/x86_64-linux-gnu/libc-2.23.so
7fccca09b000-7fccca09d000 rw-p 001c3000 08:05 24126374 /lib/x86_64-linux-gnu/libc-2.23.so
7fccca09d000-7fccca0a1000 rw-p 00000000 00:00 0
7fccca0a1000-7fccca0b9000 r-xp 00000000 08:05 24126373 /lib/x86_64-linux-gnu/libpthread-2.23.so
7fccca0b9000-7fccca2b8000 ---p 00018000 08:05 24126373 /lib/x86_64-linux-gnu/libpthread-2.23.so
7fccca2b8000-7fccca2b9000 r--p 00017000 08:05 24126373 /lib/x86_64-linux-gnu/libpthread-2.23.so
7fccca2b9000-7fccca2ba000 rw-p 00018000 08:05 24126373 /lib/x86_64-linux-gnu/libpthread-2.23.so
7fccca2ba000-7fccca2be000 rw-p 00000000 00:00 0
7fccca2be000-7fccca2d4000 r-xp 00000000 08:05 24121519 /lib/x86_64-linux-gnu/libgcc_s.so.1
7fccca2d4000-7fccca4d3000 ---p 00016000 08:05 24121519 /lib/x86_64-linux-gnu/libgcc_s.so.1
7fccca4d3000-7fccca4d4000 rw-p 00015000 08:05 24121519 /lib/x86_64-linux-gnu/libgcc_s.so.1
7fccca4d4000-7fccca646000 r-xp 00000000 08:05 6029347 /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21
7fccca646000-7fccca846000 ---p 00172000 08:05 6029347 /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21
7fccca846000-7fccca850000 r--p 00172000 08:05 6029347 /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21
7fccca850000-7fccca852000 rw-p 0017c000 08:05 6029347 /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.21
7fccca852000-7fccca856000 rw-p 00000000 00:00 0
7fccca856000-7fccca87c000 r-xp 00000000 08:05 24126370 /lib/x86_64-linux-gnu/ld-2.23.so
7fcccaa44000-7fcccaa4a000 rw-p 00000000 00:00 0
7fcccaa78000-7fcccaa7b000 rw-p 00000000 00:00 0
7fcccaa7b000-7fcccaa7c000 r--p 00025000 08:05 24126370 /lib/x86_64-linux-gnu/ld-2.23.so
7fcccaa7c000-7fcccaa7d000 rw-p 00026000 08:05 24126370 /lib/x86_64-linux-gnu/ld-2.23.so
7fcccaa7d000-7fcccaa7e000 rw-p 00000000 00:00 0
7ffc6b1c8000-7ffc6b1e9000 rw-p 00000000 00:00 0 [stack]
7ffc6b1fa000-7ffc6b1fc000 r--p 00000000 00:00 0 [vvar]
7ffc6b1fc000-7ffc6b1fe000 r-xp 00000000 00:00 0 [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall]
[1] 26639 abort (core dumped) ./test.exe
At last, when running the original code with debug mode on in CLion set up to capture Exceptions as breakpoints, it shows a Signal interruption of SIGBUS(Bus error)
or a SIGSEGV(Segmentation fault)
sometimes and last line of code executed before the interruption maps to the line:
int *v = (*kSubgraphs)[*bar(pos)];
in the foo
function of the code presented here.
I am a little bit lost with this one. My strongest assumption is that I am using smart pointers the wrong way, despite I do not see where.
Aucun commentaire:
Enregistrer un commentaire