jeudi 23 janvier 2020

C++ unicode latin symbol replace?

Starting from a text that contains characters like \u00f9, \u00a0, \u00e8 I would like to replace them with the ascii equivalents ù, è, etc.

There is my current implementation, that for some reason every now and then it delete pieces of other words and I don't understand why:

pos1 = str2.find("\\u00a0");
pos2 = str2.find("\\u00");
pos3 = str2.find("\\u20");
pos4 = str2.find("\\r\\n");
while (pos1 != std::string::npos)
{
    str2.replace(pos1, 6, "");
    pos1 = str2.find("\\u00a0");
}
while (pos2 != std::string::npos)
{
    str2.replace(pos2, 6, "?");
    pos2 = str2.find("\\u00");
}
while (pos3 != std::string::npos)
{
    str2.replace(pos3, 6, "?");
    pos3 = str2.find("\\u20");
}
while (pos4 != std::string::npos)
{
    str2.replace(pos4, 2, "\n");
    pos4 = str2.find("\\r\\n");
}

and there's an example of the text:

William Shakespeare \u00e8 stato un drammaturgo e poeta inglese, considerato come il pi\u00f9 importante scrittore in inglese e generalmente ritenuto il pi\u00f9 eminente drammaturgo della cultura occidentale.\u00a0\r\n

Aucun commentaire:

Enregistrer un commentaire