I'm trying to write the character "Ā" (https://www.fileformat.info/info/unicode/char/0100/index.htm) into a C++11 UTF8 string (using u8
prefix).
const char *const utf8 = u8"Ā";
const char *const utf8_2 = u8"\u0100";
const char *const chars = "Ā";
const int utf8_len = strlen(utf8);
const int utf8_2_len = strlen(utf8_2);
const int chars_len = strlen(chars);
Running this under MSVC (16.2.4) results in:
utf8_len == 5
utf8_2_len = 2;
chars_len = 2;
Where:
utf8 == "Ä€"
utf8_2 == "Ä€"
chars == "Ä€"
The source file is set to UTF8 (without BOM).
Trying the same with Clang and GCC works as expected:
Does anyone know why this behaviour is occurring? Why is the u8
prefixed Unicode character being encoded as 5 bytes (when it should be 2)?
Aucun commentaire:
Enregistrer un commentaire