I have a million ASCII strings, without duplicates, each at most 7 bytes long. I need to map each string to a positive integer. The largest of these ints should be not much more than a million. Although initialization may be slow, lookup should be fast: given a string, return the corresponding int (or -1, if not found). How can one implement this in C++11?
One solution: accumulate the strings into a std::unordered_map<string,int>
; then iterate over the map, assigning the ints from an incrementing counter. Then to lookup, just unordered_map::find("foo")->second
. But it smells like some other container would be faster and have less overhead (indices built in, rather than hand-coded). Maybe unordered_set
and pointer arithmetic??
The range restriction seems to make a perfect hash difficult.
(The int's range is restricted, because it indexes into a feature vector passed to svm_light. That software doesn't use sparse storage, so vectors with trillions of (mostly zero) elements make it run out of memory. So this string-to-int preprocessing sort of implements a sparse data structure.)
Aucun commentaire:
Enregistrer un commentaire