dimanche 24 mai 2015

performing intersection of sorted vectors C++

I need to intersect a vector(sorted) of unsigned integers of variable size with 30,000 other vectors(all sorted) of unsigned integers. E.g.

vector1 (of variable size e.g. 1,20,30,1000,10000) needs to be intersected with the following vectors: vector2, vector3,...vector30000. Here size of vector2,vector3,...vector29999 is 10000 unsigned integers and the size of vector30000 (i.e. the last vector) is again variable. Now I need to find out amongst vector2,vector3,...,vector30000 which vectors give non-zero intersection with vector1 (and if the intersection is non-zero, then which elements intersect). In order to do so I wrote the following multi-threaded code, so that I may parallely perform intersection on the individual vector's vector2,vector3,...,vector30000 and hence improve performance of intersection. However, I am not getting how should I restrict the number of threads spawned to the level of concurrency supported by the hardware(std::thread::hardware_concurrency();).

void multThreadIntersect(vector<unsigned>& vector1, vector<vector<unsigned> >::iterator it, int size,int i,vector<vector<unsigned> >& intersected,vector<int>& idIntersected)
{
    if(i<size) 
    {        
        vector<unsigned> intersectedLocal;
        std::set_intersection(vector1.begin(),vector1.end,(*it).begin(),(*it).end(),back_inserter(intersectedLocal));
        it++;
        idIntersected.push_back(i);
        intersected.push_back(intersectedLocal);
        auto future = std::async(std::launch::async,multThreadIntersect, vector1, it, size,intersected,idIntersected);
        future.wait();
        i++;
    }
    else
    {
        return;        
    }
}

int main()
{
    vector<unsigned> vector1;
    vector1.push_back(10); vector1.push_back(20); vector1.push_back(30);
    vector<vector<unsigned> > vecOfVectors;
    vector<unsigned> vector2;
    vector2.push_back(1); vector2.push_back(5); vector2.push_back(8); vector2.push_back(10);
    vecOfVectors.push_back(vector2);
    vector<unsigned> vector3;
    vector3.push_back(3); vector3.push_back(20); vector3.push_back(25);
    vecOfVectors.push_back(vector3);
    vector<unsigned> vector4;
    vector4.push_back(28); vector4.push_back(29); vector4.push_back(39);
    vecOfVectors.push_back(vector4);
    vector<vector<unsigned> >::iterator it=vecOfVectors.begin();
    int size=vecOfVectors.size();
    int i=0;
    vector<vector<unsigned> > intersected;
    vector<int> idIntersected; //contains those i's whose intersection was non-zero
    long unsigned int nThreads = std::thread::hardware_concurrency();
    multThreadIntersect(vector1,it,size,i,intersected,idIntersected);    
    cout<<"id intersected vector:";
    for(vector<int>::iterator it=idIntersected.begin(),l=idIntersected.end();it!=l;++it)
        cout<<" "<<(*it);
    cout<<"\n";
}

Also please suggest whether the algorithm I have written is correct or not. If the algorithm is incorrect, then please help me improve the algorithm.

The gcc version that I am using is: gcc (GCC) 4.8.2 20140120 (Red Hat 4.8.2-15)

Aucun commentaire:

Enregistrer un commentaire