mardi 31 janvier 2017

Results change based on non-modifying code

I am working off of a linux server, and I am trying to work with strings as lengths of DNA. I am trying to see if I can make one set of DNA "collide" with another set. Collide just means that two sequences are the same, but they did not originate from the same length of DNA.

Here is the data, in 5test.txt:

03111
11013
22002
22133
33122
33121

Here is main.cpp:

#include        <iostream>
#include        <fstream>
#include        <string>
#include        <vector>
#include        <cstdlib>
#include        <typeinfo>
using namespace std;

inline string insert(const string& who, int where, string what)
{
        string temp = who;
        temp.insert(where, what);
        return temp;
}

struct c_mDNA                               //holds the DNA sequences, remembering where it came from
{
        string seq;
        const string* orig;
};

ostream& operator<<(ostream& os, c_mDNA& m) //to print out debug info easier
{
        os << "seq: " << m.seq << "\torig: " << *m.orig << endl;
        return os;
}

int main()
{

        ifstream input; string inputname;               //These next couple lines deal with input
        inputname = "5test.txt";
        input.open(inputname.c_str());
        string line;                                    //line will hold the inputted lines
        int n = 5;                                      //we're working with length 5 as a test
        vector<string> oDNA;                            //this holds all of the original strands
        vector<c_mDNA> mDNA, iDNA;                      //this will hold all of the mutated strands, m being the deleted and i being the possible insertions

        //input loop
        while (getline(input, line))
        {
                //change line from a sequence of numbers to nucleotide ACTG
                ...
                oDNA.push_back(line);
        }

        //insert loop
        for(auto oliga : oDNA)
        {
                for (int i = 0; i < n; i++)
                {
                        iDNA.push_back(c_mDNA { insert(oliga, i, "A"), &oliga } );
                        cout << iDNA.back() << endl;
                        //do the above for the other 3 nucleotides
                        ...                            
                }

                //these next couple lines are important

                //for (auto m : iDNA)
                //{
                //      cout  << m << endl;
                //}
        }

        //mutate loop
        for (auto& oliga : oDNA)
        {   
                for (int i = 0; i < oliga.length(); i++)
                {
                        string temp = oliga;
                        temp.erase(i,1);

                        //There are 16 different combinations of two nucleotides
                        mDNA.push_back(c_mDNA{temp + "AA", &oliga});
                        mDNA.push_back(c_mDNA{temp + "CA", &oliga});
                        mDNA.push_back(c_mDNA{temp + "TA", &oliga});
                        mDNA.push_back(c_mDNA{temp + "GA", &oliga});
                        mDNA.push_back(c_mDNA{temp + "AC", &oliga});
                        mDNA.push_back(c_mDNA{temp + "CC", &oliga});
                        mDNA.push_back(c_mDNA{temp + "TC", &oliga});
                        mDNA.push_back(c_mDNA{temp + "GC", &oliga});
                        mDNA.push_back(c_mDNA{temp + "AT", &oliga});
                        mDNA.push_back(c_mDNA{temp + "CT", &oliga});
                        mDNA.push_back(c_mDNA{temp + "TT", &oliga});
                        mDNA.push_back(c_mDNA{temp + "GT", &oliga});
                        mDNA.push_back(c_mDNA{temp + "AG", &oliga});
                        mDNA.push_back(c_mDNA{temp + "CG", &oliga});
                        mDNA.push_back(c_mDNA{temp + "TG", &oliga});
                        mDNA.push_back(c_mDNA{temp + "GG", &oliga});

                }
        }

        //check loop
        for (auto m : iDNA)
        {
                cout  << m << endl;
        }

        ofstream out("5out_test.txt");    
        int collisions(0);

        //output loop
        for (const auto& m_oliga : mDNA)
        {
                bool collide = false; c_mDNA collude;   //collude stores the collided codeword
                for (const auto& i_oliga : iDNA)
                {
                        if (m_oliga.seq == i_oliga.seq) //if sequences are the same
                        {
                                if ( m_oliga.orig != i_oliga.orig) //if the original seqs are the same
                                {
                                        cout << *m_oliga.orig << " and " << *i_oliga.orig << endl;
                                        cout << m_oliga.orig << " and " << i_oliga.orig << endl;
                                        collide = true;
                                        collude = i_oliga;
                                        collisions++;
                                        break;
                                }
                        }
                }

                if (collide) out << m_oliga.seq << "    orig: " << *m_oliga.orig << "   collides with: " << collude.seq << " from: " << *collude.orig << endl;
                else out << m_oliga.seq << "    orig: " << *m_oliga.orig << endl;
        }

        return 0;
}

I have labelled the five loops "input", "insert", "mutate" "check" and "output". there is a copy of the "check" loop inside of the "insert" loop that I have commented out.

This is creeping me out. When I leave that copy commented, I get garbage like this output from the "check" loop:

 seq: GCGTAT     orig: GCGTAT

orig should be a length 5 string, and it should be pointing to an element in the oDNA vector. From the "output" loop, when it finds a collision, it prints this out to the screen:

GGGTA and
0x61cf80 and 0x7fffffffd6a0

the first line doesn't print anything for *i_oliga.orig. The pointer is still pointing somewhere.

Now when I uncomment the first "check" loop:

seq: GCGTAT     orig: GCGTT

GGGTA and GCGTT
0x61cf80 and 0x7fffffffd650

For some reason, the pointer is still pointing to a completely different place, but I am getting the answer that I want. i have tested to make sure that this is consistent behavior.

Why does the commented out loop change the results?

Another thing that might be useful to know is that when I import main.cpp and 5test.txt to my home computer and run the program on visual studio 15, I only ever get garbage results.

Aucun commentaire:

Enregistrer un commentaire