Here are some excerpts from my copy of the 2014 draft standard N4140
22.5 Standard code conversion facets [locale.stdcvt]
3 For each of the three code conversion facets
codecvt_utf8,codecvt_utf16, andcodecvt_utf8_utf16:
(3.1) —Elemis the wide-character type, such aswchar_t,char16_t, orchar32_t.4 For the facet
codecvt_utf8:
(4.1) — The facet shall convert between UTF-8 multibyte sequences and UCS2 or UCS4 (depending on the size ofElem) within the program.
One interpretation of these two paragraphs is that wchar_t must be encoded as either UCS2 or UCS4. I don't like it much because if it's true, we have an important property of the language buried deep in a library description. I have tried to find a more direct statement of this property, but to no avail.
Another interpretation that wchar_t encoding is not required to be either UCS2 or UCS4, and on implementations where it isn't, codecvt_utf8 won't work for wchar_t. I don't like this interpretation much either, because if it's true, and neither char nor wchar_t native encodings are Unicode, there doesn't seem to be a way to portably convert between those native encodings and Unicode.
Which of the two interpretations is true? Is there another one which I overlooked?
Aucun commentaire:
Enregistrer un commentaire