mardi 2 février 2016

std::wstring_convert / std::codecvt differemces on MSVC 2013 (Windows) / gcc (linux)

I have a library using std::wstring for handling (Unicode, say UCS-2 on Windows) strings all over the place any my task is to port that to linux (with gcc 5.2). I am getting despreate with the character conversions. I use the following code for converting from UTF-16 to UTF-8, it works fine on both platforms:

        std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>, wchar_t> converter;
        std::string convertedString = converter.to_bytes(utf16string);

Converting back from UTF-8 to UTF-16 with a similar approach works fine on Windows. On Linux I do not catch a range_error, so apparently it works but when I compare the result of UTF-16 -> UTF-8 -> UTF16 with the parameter I have passed to the first UTF-16 -> UTF-8 conversion, they are not identical:

        std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>, wchar_t> converter;
        std::wstring convertedString = converter.from_bytes(utf8String);

I came accross this article:

BMane states that wstring should not be used, but this is not an option. Furthermore if I try to use std::u16string and char16_t instead of std::wstring and wchar_t as he suggests, I still have the same problem (test fails which verifies that converting UTF16 to UTF8 and back).

I have considered to use libiconv instead of stl, but on Linux, wchar_t is 4 bytes long which would require to convert each character to a 2-byte type before passing it to iconv(). I hope that this can be avoided.

Thank you for any help!

Aucun commentaire:

Enregistrer un commentaire