samedi 23 mai 2020

How to search an utf8 characters?

So I am using C++11, and the input is a text file encoded in UTF-8 encoding, and what the program does is to read the text file line by line, and search whether a given character is present in the line, since UTF-8 is compatible with ASCII, that means new line is same in ASCII and UTF-8, I'm not using wstring, I'm just using string, what I do is get the UTF-8 encoded bytes, and create a std::string of that, e.g., I need to search for 值(U+503C) in each line, and from here we can see the UTF-8 encoded bytes for this character is 0xE5 0x80 0xBC, so I have something like this, does this look right and will work?

ifstream input(utf8file);
string line;
const string t("\xe5\x80\xbc"); // utf8 bytes for 值
while (input) {
    getline(input, line);
    if (line.find(t) != string::npos) {
        do_found();
    } else {
        not_found();
    }
}

Aucun commentaire:

Enregistrer un commentaire