I have a library using std::wstring for handling (Unicode, say UCS-2 on Windows) strings all over the place any my task is to port that to linux (with gcc 5.2). I am getting despreate with the character conversions. I use the following code for converting from UTF-16 to UTF-8, it works fine on both platforms:
std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>, wchar_t> converter;
std::string convertedString = converter.to_bytes(utf16string);
Converting back from UTF-8 to UTF-16 with a similar approach works fine on Windows. On Linux I do not catch a range_error, so apparently it works but when I compare the result of UTF-16 -> UTF-8 -> UTF16 with the parameter I have passed to the first UTF-16 -> UTF-8 conversion, they are not identical:
std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>, wchar_t> converter;
std::wstring convertedString = converter.from_bytes(utf8String);
I came accross this article:
BMane states that wstring should not be used, but this is not an option. Furthermore if I try to use std::u16string and char16_t instead of std::wstring and wchar_t as he suggests, I still have the same problem (test fails which verifies that converting UTF16 to UTF8 and back).
I have considered to use libiconv instead of stl, but on Linux, wchar_t is 4 bytes long which would require to convert each character to a 2-byte type before passing it to iconv(). I hope that this can be avoided.
Thank you for any help!
Aucun commentaire:
Enregistrer un commentaire