lundi 25 janvier 2016

Different outputs when using C++11 \u vs \x to output a Unicode string?

This is a simple program which should output the following four Unicode glyphs. There are four glyphs in total composed from five codepoints or 14 bytes with straight UTF-8.

My impression is that the output for these should be the same; one is simply a list of codepoints and the other is the UTF-8 encoded form of the same list.

Note that the some of these symbols may not be visible from your console.

#include <iostream>

using namespace std;

int main() {
   cout << "\x61\xE0\xA4\xA8\xE0\xA4\xBF\xE4\xBA\x9C\xF0\x90\x82\x83" << endl;
   cout << "\u0061\u0928\u093F\u4E9C\u10083" << endl;

   return 0;
}

Original image source: http://ift.tt/1WLIgEF

Aucun commentaire:

Enregistrer un commentaire