I'm trying to read and process multiple files, in parallel, right now I have two file and two parsing functions which I call in 2 threads:
In the first case, I'm constructing string from parts of the file (reading headers of csv), the first function:
void csv_parse_items_file(const char* file, size_t fsize,
//void(*deal)(const string&, const size_t&, const int&),
size_t arrstart_counter = 0) {
size_t idx = 0;
int line = 0;
size_t last_idx = 0;
int counter = 0;
cout<<"items_header before loop, thread_id="+std::to_string(thread_index())<<endl;
map<string, int> headers;
{
int counter = 0;
while (file[idx] && file[idx] != '\n') {
if (file[idx] == '\t' || file[idx] == '\n') {
string key(file, last_idx, idx - last_idx);
headers[key] = counter++;
last_idx = idx + 1;
}
++idx;
}
}
cout<<"items_header after loop, thread_id="+std::to_string(thread_index())<<endl;
... then the processing continues in a loop
the second function:
void csv_parse_users_file(const char* file, size_t fsize,
//void(*deal)(const string&, const size_t&, const int&),
size_t arrstart_counter = 0) {
size_t idx = 0;
int line = 0;
size_t last_idx = 0;
int counter = 0;
map<string, int> headers;
{
int counter = 0;
while (file[idx] && file[idx] != '\n') {
if (file[idx] == '\t' || file[idx] == '\n') {
string key(file, last_idx, idx - last_idx);
headers[key] = counter++;
last_idx = idx + 1;
}
++idx;
}
}
when I run in this config, the output is:
users_mapped 86431022
items_mapped237179072
1497021
1306055
items_header before loop, thread_id=0
processed 100000users thread_id:1
processed 200000users thread_id:1
processed 300000users thread_id:1
processed 400000users thread_id:1
processed 500000users thread_id:1
processed 600000users thread_id:1
processed 700000users thread_id:1
processed 800000users thread_id:1
processed 900000users thread_id:1
processed 1000000users thread_id:1
processed 1100000users thread_id:1
processed 1200000users thread_id:1
processed 1300000users thread_id:1
processed 1400000users thread_id:1
finished_processing_users:1497020
0x700008d52c80 finished
items_header after loop, thread_id=0
processed 100000items, thread_id:0
processed 200000items, thread_id:0
processed 300000items, thread_id:0
processed 400000items, thread_id:0
processed 500000items, thread_id:0
processed 600000items, thread_id:0
processed 700000items, thread_id:0
processed 800000items, thread_id:0
processed 900000items, thread_id:0
processed 1000000items, thread_id:0
processed 1100000items, thread_id:0
processed 1200000items, thread_id:0
processed 1300000items, thread_id:0
finished_p
Now, if I edited the first function, and commented out this line string key(file, last_idx, idx - last_idx);
so the first function will start like this:
void csv_parse_items_file(const char* file, size_t fsize,
//void(*deal)(const string&, const size_t&, const int&),
size_t arrstart_counter = 0) {
size_t idx = 0;
int line = 0;
size_t last_idx = 0;
int counter = 0;
cout<<"items_header before loop, thread_id="+std::to_string(thread_index())<<endl;
map<string, int> headers;
{
int counter = 0;
while (file[idx] && file[idx] != '\n') {
if (file[idx] == '\t' || file[idx] == '\n') {
//string key(file, last_idx, idx - last_idx);
headers["ok"] = counter++;
last_idx = idx + 1;
}
The output is:
users_mapped 86431022
items_mapped237179072
1497021
1306055
items_header before loop, thread_id=0
items_header after loop, thread_id=0
processed 100000users thread_id:1
processed 200000users thread_id:1
processed 300000users thread_id:1
processed 100000items, thread_id:0
processed 400000users thread_id:1
processed 500000users thread_id:1
processed 200000items, thread_id:0
processed 600000users thread_id:1
processed 700000users thread_id:1
processed 300000items, thread_id:0
processed 800000users thread_id:1
processed 900000users thread_id:1
processed 1000000users thread_id:1
processed 400000items, thread_id:0
processed 1100000users thread_id:1
processed 1200000users thread_id:1
processed 500000items, thread_id:0
processed 1300000users thread_id:1
processed 1400000users thread_id:1
finished_processing_users:1497020
0x700001870c80 finished
processed 600000items, thread_id:0
processed 700000items, thread_id:0
processed 800000items, thread_id:0
processed 900000items, thread_id:0
processed 1000000items, thread_id:0
processed 1100000items, thread_id:0
processed 1200000items, thread_id:0
processed 1300000items, thread_id:0
finished_processing_items:1306054
The header file is less then 1000 chars compared with the size of the files (86431022 and 237179072).
$g++ -v
Configured with: --prefix=/Library/Developer/CommandLineTools/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 8.0.0 (clang-800.0.42.1)
Target: x86_64-apple-darwin16.1.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/binrocessing_items:1306054
compiled with g++ -pthread -c -g -std=c++11
files mmaped with mmap(NULL, size_, PROT_READ, MAP_PRIVATE, fd_, 0);
I can't figure out why having the string construction in both thread from two different mmaped files, with no common variables other then cout
, cause one thread to wait for the other! is there any locks when construction std::string?
Aucun commentaire:
Enregistrer un commentaire