When using the C++11 standard, is there any guarantee that an ASCII character stored in a char32_t or char16_t codepoint will be properly cast to char?
char32_t and char16_t are both defined to always be unsigned (http://ift.tt/1hdX2hK). However, char may be signed or unsigned depending on the system.
I would assume that ASCII characters always work:
char32_t original = 'b';
char value = static_cast<char>(original);
However, what about values that are UTF-8 code units, which start with the first bit == 1, and are extracted from the UTF-32 character using a bitmask during conversion, e.g.:
char32_t someUtf32CodeUnit = 0x00001EA9;
// Third code-unit of ẩ
char extractedCodeUnit = static_cast<char>(((someUtf32CodeUnit >> 6) & 0x3F) | 0x80);
Is it guaranteed that the conversion on all systems will work the same way (resulting in the same expected bits of said UTF-8 code unit) or will the unsigned<->signed casts potentially make any difference?
Aucun commentaire:
Enregistrer un commentaire