jeudi 28 mai 2015

How to find vectors efficiently in another data structure

I have written the following code for finding a vector<unsigned> (sp) in another data structure sbp (of vector<pair<unsigned, pair<vector<unsigned>, vector<unsigned> > > >). However, since the size of vector<pair<unsigned, pair<vector<unsigned>, vector<unsigned> > > > sbp is 1/2 billion elements, therefore finding another vector in it is consuming a lot of time. The code is running since the last 24 hours but seems to be stuck in an infinite loop. The data structure vector<vector<unsigned> > contains the vectors (size 20,000 vectors) which need to be looked-in in vector<pair<unsigned, pair<vector<unsigned>, vector<unsigned> > > > sbp.

typedef std::pair<std::vector<unsigned>, std::vector<unsigned> > vec_pair;
struct awd
{
public:
    explicit awd(const std::set<std::vector<unsigned> >& wtd) :  wtd(wtd) {}

    bool operator () (const pair<unsigned,vec_pair> & p) const {
        return wtd.count((p.second).second) != 0;
    }
private:
    const std::set<std::vector<unsigned> >& wtd;
};

bool msf(const pair<unsigned,vec_pair>& a, const pair<unsigned,vec_pair>& b)
{
    if((a.second).first.size() == (b.second).first.size()) {
        return std::lexicographical_compare((a.second).first.begin(), (a.second).first.end(), (b.second).first.begin(), (b.second).first.end());
    } else {
        // Sort by size.
        return (a.second).first.size() < (b.second).first.size();
    }
}

bool mss(const pair<unsigned,vec_pair>& a, const pair<unsigned,vec_pair>& b)
{
    if((a.second).second.size() == (b.second).second.size()) {
        return std::lexicographical_compare((a.second).second.begin(), (a.second).second.end(), (b.second).second.begin(), (b.second).second.end());
    } else {
        return (a.second).second.size() < (b.second).second.size();
    }
}

int main(int argc, char** argv) {
    vector<vector<unsigned> > sp; //vector of size 20,000  elements
    vector<pair<unsigned, pair<vector<unsigned>, vector<unsigned> > > > sbp; //vector of size 1/2 billion elements
    for(vector<vector<unsigned> >::iterator itsp=sp.begin(),lsp=sp.end();itsp!=lsp;++itsp)
    {
        std::set<std::vector<unsigned> > wtd;        
        wtd.insert(*itsp);
        vector<pair<unsigned, pair<vector<unsigned>, vector<unsigned> > > > lsbp=sbp;
        std::vector<pair<unsigned,vec_pair> >::iterator mid = std::partition(lsbp.begin(), lsbp.end(), awd(wtd));    
        std::sort(lsbp.begin(), mid, mss);
        std::sort(lsbp.begin(), mid, msf);

        for (std::vector<pair<unsigned,vec_pair> >::iterator itSub = lsbp.begin(); itSub != mid; ++itSub) 
        {                          
            cout<<(itSub->first);
        }
    }

    return 0;
}

I am ok even if the proposed solution is a parallel implementation using openmp or any other such variant. I am using gcc 4.8.2

I insert values in "sp" and "sbp" once, but it will be great if the proposed solution is also able to insert values in "sp" and "sbp" efficiently

Aucun commentaire:

Enregistrer un commentaire