One of consequences of the IEEE 754 standard is the non-intuitive behavior of std::unordered_set<double>
, when not-a-number elements (NAN
s) are inserted.
Due to the fact that NAN!=NAN
, after the following sequence:
#include <iostream>
#include <cmath>
#include <unordered_set>
int main(){
std::unordered_set<double> set;
set.insert(NAN);
set.insert(NAN);
std::cout<<"Number of elements "<<set.size()<<"\n"; //there are 2 elements!
}
there are two elements in the set
(see it live): NAN
and NAN
!
Mine main issue with this is, that when N
NAN
s are inserted into the hash-set, they all hit the same hash-bucket and the performance of N
inserts into the hash-set degenerates to the worst-case running time - O(N^2)
.
For an example, see the listing at the end of the question or here live - inserting NAN
takes some order of magnitude more time than a "normal" floating number.
My question: is it possible (and if yes - how) to tweak std::unordered_set<double>
in such a way, that there is at most one NAN
-element in the set, no matter the flavor of inserted NAN
s (NAN
, -NAN
and so on)?
Listing:
#include <iostream>
#include <cmath>
#include <unordered_set>
#include <chrono>
constexpr int N=5000;
void test_insert(double value)
{
std::unordered_set<double> s;
auto begin = std::chrono::high_resolution_clock::now();
for (int i = 0; i < N; i++) {
s.insert(value);
}
auto end = std::chrono::high_resolution_clock::now();
std::cout << "Duration: " << (std::chrono::duration_cast<std::chrono::nanoseconds>(end - begin).count() / 1e9) << "\n";
std::cout << "Number of elements: "<<s.size()<<"\n";
}
int main(){
std::cout << "Not NAN\n";
test_insert(1.0); //takes 0.0001 s
std::cout << "NAN\n";
test_insert(NAN); //takes 0.2 s
}
Aucun commentaire:
Enregistrer un commentaire