mercredi 9 août 2023

C++ Why does this piece of code work that concurrently writes to a vector in multiple threads? It doesn't get corrupted

Note: This is compiled with GCC 11.2.0

The following is supposed to be a simplified example of my more complicated code. My actual code is too complicated to explain, so I tried to create a simple reproducible version of the problem, but it actually worked. And now aside from my code, I'm more interested in why this simple example even works.

Basically, I'm creating a vector of maps. The number of maps depends on the number of threads I'm running. Each thread will populate each map in the vector. I expected that since the vector is being written to by multiple threads at the same time, it should have corrupted data because it's all contiguous memory, that changes dynamically at run-time, right? So how does this not have corrupted data?

#define _GLIBCXX_USE_NANOSLEEP  //add it top of c++ code
#include <iostream>
#include <vector>
#include <unordered_map>
#include <thread>

void add_map(std::unordered_map<int, int>& um, int map_size) {
  for (int i = 0; i < map_size; i++) {
    um.insert({i, i});
  }
}

void print_map(std::unordered_map<int, int>& um) {
  for (auto& u : um) { std::cout << u.first << " " << u.second << std::endl; }
}

int main(int argc, char* argv[]) {
  // Get number of threads from user input and set random seed
  int num_threads = std::stoi(argv[1]);
  std::srand((unsigned) time(NULL));

  // Populate a vector with num_threads maps, so each trhead can write to each map inside the vector
  std::vector<std::unordered_map<int, int>> chunks;
  for (int i = 0; i < num_threads; i++) {
    std::unordered_map<int, int> empty_map;
    chunks.push_back(empty_map);
  }

  // Come up with the sizes for each map that will be handled by each thread
  std::vector<int> chunk_sizes;
  for (int i = 0; i < num_threads; i++) {
    int chunk_size = 1 + (std::rand() % 5);
    chunk_sizes.push_back(chunk_size);
  }
  std::cout << "Chunk sizes" << std::endl; for (int i = 0; i < num_threads; i++) { std::cout << chunk_sizes[i] << " "; } std::cout << std::endl;

  // For each thread, create a <int, int> unordered map so that the parts of the vector 'chunks' gets populated concurrently
  std::vector<std::thread> threads;
  for (int i = 0; i < num_threads; i++) {
    threads.push_back(std::thread(add_map, std::ref(chunks[i]), chunk_sizes[i]));
  }
  for (int i = 0; i < threads.size(); i++) { threads[i].join(); }

  // Print the maps
  for (int i = 0; i < chunks.size(); i++) {
    std::cout << "=== chunk: " << i << "; size: " << chunk_sizes[i] << " ===" << std::endl;
    print_map(chunks[i]);
  }
}

When run with an argument of 5, I get the following output

Chunk sizes
2 5 5 4 4
=== chunk: 0; size: 2 ===
1 1
0 0
=== chunk: 1; size: 5 ===
4 4
3 3
2 2
1 1
0 0
=== chunk: 2; size: 5 ===
4 4
3 3
2 2
1 1
0 0
=== chunk: 3; size: 4 ===
3 3
2 2
1 1
0 0
=== chunk: 4; size: 4 ===
3 3
2 2
1 1
0 0

Looks like the vector of maps is being handled as expected. But why?

Aucun commentaire:

Enregistrer un commentaire