vendredi 10 janvier 2020

Qt C++11 string lists intersection via substring comparation

I have some problem with string lists comparations. I need to reach good performance so I searched several ways for that, and now I try to use std::set_intersection to reach that.

Major problem is I need to compare these ones via substrings, for example I have this lists:

1

111111
111222
222333
333444

2

444111
555222
666333
777444
555111
222111
111555

Let's guess I used filter, which will make substr from first 3 digits from these strings (just for example, I just already done this function). And in result I need to get that intersection:

111111
111222
111555
222111
222333

Now my code looks like:

// std::string getCheckBody( const std::string* str, QPair<int, int> filters )
// data1 / data2 - source witch used for forming stdLists, doesn't matter :)

auto leftStdList = new std::list<std::string>( data1 );
auto rightStdList = new std::list<std::string>( data2 );
auto result = QSharedPointer<std::list<std::string>>( new std::list<std::string>() );

std::set_intersection(leftStdList->begin(), leftStdList->end(),
                      rightStdList->begin(), rightStdList->end(), std::back_inserter( *result ),
                          [=] (const QString& one, const QString& two) -> bool {
     auto o = getCheckBody( one, filters );
     auto t = getCheckBody( two, filters );

     // got this condition from another thread here
     return  ( o.size() == t.size() )
             ? (o < t)
             : ( o.size() < t.size() );
});

And now I get this result:

111111
222333

Firstly, ignoring other values with same data in first list, secondly, ignoring second list. Second one can be solved by duplicate this function with switching lists. But how can I include all values, which I need in one list (at least)?

I never used before comparation functions in algorithms, and specially for string comparing, I suspect that I used wrong conditions for that. And maybe I use wrong method (std::set_intersection)?

About data sizes, it is ~100k string lists usually, so I'm really searching how to optimize this task.

Can you help me find the solution, please? And can anyone give some advices for this task?

Thanks

Aucun commentaire:

Enregistrer un commentaire