Starting from a text that contains characters like \u00f9, \u00a0, \u00e8
I would like to replace them with the ascii equivalents ù, è
, etc.
There is my current implementation, that for some reason every now and then it delete pieces of other words and I don't understand why:
pos1 = str2.find("\\u00a0");
pos2 = str2.find("\\u00");
pos3 = str2.find("\\u20");
pos4 = str2.find("\\r\\n");
while (pos1 != std::string::npos)
{
str2.replace(pos1, 6, "");
pos1 = str2.find("\\u00a0");
}
while (pos2 != std::string::npos)
{
str2.replace(pos2, 6, "?");
pos2 = str2.find("\\u00");
}
while (pos3 != std::string::npos)
{
str2.replace(pos3, 6, "?");
pos3 = str2.find("\\u20");
}
while (pos4 != std::string::npos)
{
str2.replace(pos4, 2, "\n");
pos4 = str2.find("\\r\\n");
}
and there's an example of the text:
William Shakespeare \u00e8 stato un drammaturgo e poeta inglese, considerato come il pi\u00f9 importante scrittore in inglese e generalmente ritenuto il pi\u00f9 eminente drammaturgo della cultura occidentale.\u00a0\r\n
Aucun commentaire:
Enregistrer un commentaire