I have a program that takes in a text file of DNA strings, splits them into kmer substrings and counts how many times a unique substring pops up. My only issue is having it recognize a string value "N" and ignore it in the file... for example a text file like so:
3 3
ACNTG
ACTG
ACTG
would split the dna sequence into 3 kmers, hence the first integer. The issue is i want to ignore the N and move on without including the N in the unique string value. so the output would be...
ACT,CTG, TGA... and so on while ignoring the N value. here is the portion of my code that I believe that portion should be included into:
#include <fstream>
#include <iostream>
#include <string>
#include <unordered_map>
std::string kmer = "";
std::unordered_map<std::string, int > dna;
for(int i = 0; i< s.length() ; ++i){
int z= 0;
kmer = s.substr(z,k);
++z;
if (kmer.length() != k){
break;
}
//DONT UNDERSTAND WHY THIS WOULDNT WORK
if(!dna.find("N")) !=std::string::npos)){
dna[kmer]++;
}
}
for (std::unordered_map<std::string,int>::iterator it=dna.begin(); it!=dna.end(); ++it){
std::cout << it->first << " " << it->second << std::endl;
}
f.close();
return 0;
}
Aucun commentaire:
Enregistrer un commentaire