First of all, I know there are many highly relevant questions, but my very first implementation (based on some suggestions from these Q&Q) is not efficient enough.
I am looking for a way to (significantly) improve my very first implementation of reading a huge (>10000x10000
) non-symmetric non-sparse 2-dimensional array (matrix) with string indices from the input text file. Assume also, that we don't know the size of the matrix in advance.
The structure of the external input file (think like a distance matrix between any two locations) looks something like this:
A B C D E F G
A 0 10 20 30 40 50 60
B 15 0 25 35 45 55 65
C 20 30 0 40 50 60 70
D 25 35 45 0 65 75 85
E 15 20 25 35 0 55 65
F 20 30 40 50 60 0 70
G 35 45 55 65 75 85 0
At the moment, I came up with the following solution:
std::map<std::string, std::map<std::string, int>>
ReadDistancesFromFile(const char *name) {
std::string filename(name);
std::clog << "Trying to open and read: " << filename << std::endl;
std::ifstream file(name);
/// If .is_open() returns False, perror prints the error code stored in errno
if (!file.is_open())
std::perror(("Error while opening file " + filename).c_str());
/// Map of maps to save all read distances
std::map<std::string, std::map<std::string, int>> distances;
/* 1. Is such an efficient structure (container) for my purpose:
a) to store data efficiently
b) to access data using indices quickly?
c) to update values time after time
d) insertion/deletion of new elements doesn't happen often */
/// Vector to store all `String` type indices
std::vector<std::string> indices;
/// String to store index (location name)
std::string index;
/// Store line from the external file
std::string line;
/// Read the first line containing all String indices (location names)
std::getline(file, line);
std::istringstream iss(line);
/// Process the first line: save all location names into `indices` vector
while (iss >> index) {
indices.push_back(index);
}
/* 2. Probably I could use .reserve() before the while loop?
The problem that I don't know the size in advance. */
/// Read the file via std::getline(). Rules obeyed:
/// - first the I/O operation, then error check, then data processing
/// - failbit and badbit prevent data processing, eofbit does not
while (std::getline(file, line)) {
std::istringstream is(line);
/* 3. Is it efficient to define a stringstream variable inside a loop? */
/// For each new line (matrix row), read the first String element (location name)
is >> index;
int distance; // To store distance value
uint column = 0; // Column number to access location names from `indices` vector
/// Process the line further: store Int distances from the input stream
while (is >> distance) {
distances[index][indices[column++]] = distance;
}
}
/// Only in case of set badbit we are sure that errno has been set
/// Use perror() to print error details
if (file.bad())
std::perror(("Error while reading file " + filename).c_str());
/// close file
file.close();
/// With C++11, std::map has move-semantics, which means the local map will be moved
/// on return and in some cases even the move can be elided by the compiler (RVO)
return distances;
}
-
First, I left three questions in the source code as comments. Your answers are very welcome.
-
Second, at the moment, I did some minimal benchmarks using a much smaller input file of ~2000x2000, and it took on my mid-range MacBook Pro (late 2015) around ~30 sec. I believe this is too long (performance in my case really matters) and would be grateful for your ideas on how to improve this code.
Aucun commentaire:
Enregistrer un commentaire