I am looking for suggestions on how best to implement my code for the following requirements. During execution of my c++ code, I frequently need to access data stored in a dictionary, which itself is stored in a text file. The dictionary contains 100 million entries, and at any point in time, my code would query data corresponding to some particular entry among those 100 million entries. There is no particular pattern in which those queries are made, and further during the lifetime of the program execution, not all entries in the dictionary are queried. Also, the dictionary will remain unchanged during the program's lifetime. The data corresponding to each entry is not all of the same length. The file size of my dictionary is ~24 GB, and I have only 16 GB of RAM memory. I need my application to be very fast, so I would like to know how best to implement such a system so that read access times can be minimized.
I am also the one who is creating the dictionary, so I do have the flexibility in breaking down my dictionary into several smaller volumes. While thinking about what I can do, I came up with the following, but not sure if either are good.
- If I store the line offset for each entry in my dictionary from the beginning of the file, then to read the data for the corresponding entry, I can directly jump to the corresponding offset. Is there a way to do this using say ifstream without looping through all lines until the offset line? A quick search on the web seems to suggest this is not possible atleast with ifstream, are there are other ways this can be done?
- The other extreme thought was to create a single file for each entry in the dictionary, so I would have 100 million files. This approach has the obvious drawback of overhead in opening and closing the file stream.
In general I am not convinced either of the approaches I have in mind are good, and so I would like some suggestions.
Aucun commentaire:
Enregistrer un commentaire