samedi 29 juillet 2017

What is a good datastructure for runtime efficient filtering in C++?

I have a std::vector<std::vector<double>> samples of representing a list of sampling points in an n-dimensional vector space. I want to evaluate these points with a function and select those with the highest score, but since samples is the result of a cartesian product routine (i.e. the set of all possible combinations of each one-dimensional sampling) it is quite large in general (nSamplesPerDimension^nDimensions).

Therefore I want to prepend a filtering process to the rigorous evaluation of all points which preselects a smaller number of hopefully 'good' candidates (fixed number, e.g. 1e6).

Note that following this filtering, I still need to evaluate the candidates!

My current approach is to use std::multimap<double, vector<double> *> candidates mapping score to candidates, populate this map during the filtering, reorder the points according to the rigorous evaluation and select the best 100 or so.

The issue is, that the filtering procedure uses a non deterministic approach that only gives a very moderate hint to the final score obtained by evaluating a candidate std::vector<double>, therefore the real score might be quite different than the temporary one obtained during filtering, requiring a reordering of the map.

Also, due to the score being used as a key I need to delete the entry corresponding to the temporary key and reinsert with the real score.

Last but not least: I am mainly interested in runtime efficiency (I hope from the numbers up there you see that this is not falling into the category of premature optimization). While memory obviously is an issue as well, I will try to incorporate the filtering during the creation of the samples, i.e. preventing some of the samples to be generated in the first place.

  • This seems to me like a pretty standard workflow, are there any best practices for it?

  • Does it make sense to use different containers for filtering and evaluation (possibly with some algorithm for selecting the k best afterwards)?

Aucun commentaire:

Enregistrer un commentaire