jeudi 23 février 2017

Length error initializing UTF-8 strings with hexadecimal values

I'm trying to use C++11 u8, u and U literals to encode this emoji: 😁
http://ift.tt/2lOMWAH

Now, I'm using the hex values for each encoding to save it:

const char* utf8string = u8"\xF0\x9F\x98\x81";
const char16_t* utf16string = u"\xD83D\xDE01";
const char32_t* utf32string = U"\x0001F601";

This works fine in GCC 6.2 and Clang 3.8, each string has a length of 4, 2 and 1 respectively. But in Visual Studio 2015 compiler it has length of 8, 2, and 1 respectively.

I'm using this code to get the length of each string:

#include <iostream>
#include <cwchar>

int main() {
    const char* smiley8 = u8"\xF0\x9F\x98\x81";
    const char16_t* smiley16 = u"\xD83D\xDE01";
    const char32_t* smiley32 = U"\x0001F601";

    auto smiley8_it = smiley8;
    while ((*++smiley8_it) != 0);

    auto smiley16_it = smiley16;
    while ((*++smiley16_it) != 0);

    auto smiley32_it = smiley32;
    while ((*++smiley32_it) != 0);

    size_t smiley8_size = smiley8_it - smiley8;
    size_t smiley16_size = smiley16_it - smiley16;
    size_t smiley32_size = smiley32_it - smiley32;

    std::cout << smiley8_size << std::endl;
    std::cout << smiley16_size << std::endl;
    std::cout << smiley32_size << std::endl;
}

I also test the UTF-8 string using std::strlen.

Any clues why this happens?

Aucun commentaire:

Enregistrer un commentaire