jeudi 4 août 2016

Does the C++ standard mandate an encoding for wchar_t?

Here are some excerpts from my copy of the 2014 draft standard N4140

22.5 Standard code conversion facets [locale.stdcvt]

3 For each of the three code conversion facets codecvt_utf8, codecvt_utf16, and codecvt_utf8_utf16:
(3.1) — Elem is the wide-character type, such as wchar_t, char16_t, or char32_t.

4 For the facet codecvt_utf8:
(4.1) — The facet shall convert between UTF-8 multibyte sequences and UCS2 or UCS4 (depending on the size of Elem) within the program.

One interpretation of these two paragraphs is that wchar_t must be encoded as either UCS2 or UCS4. I don't like it much because if it's true, we have an important property of the language buried deep in a library description. I have tried to find a more direct statement of this property, but to no avail.

Another interpretation that wchar_t encoding is not required to be either UCS2 or UCS4, and on implementations where it isn't, codecvt_utf8 won't work for wchar_t. I don't like this interpretation much either, because if it's true, and neither char nor wchar_t native encodings are Unicode, there doesn't seem to be a way to portably convert between those native encodings and Unicode.

Which of the two interpretations is true? Is there another one which I overlooked?

Aucun commentaire:

Enregistrer un commentaire