samedi 30 mai 2015

Why is wstring_convert throwing a range_error?

I'm writing some code that needs to convert between byte strings and wide strings, using the system locale. When reading from a file, this is incredibly easy to do. I can use std::wifstream, imbue it with std::locale(""), and then just use std::getline.

According to cppreference's codecvt page, wifstream just uses codecvt<wchar_t, char, mbstate_t>, so I thought that I might be able to convert between std::string and std::wstring by using that as well:

// utility wrapper to adapt locale-bound facets for wstring/wbuffer
convert
template<class Facet>
struct deletable_facet : Facet
{
    template<class ...Args>
    deletable_facet(Args&& ...args) : Facet(std::forward<Args>(args)...) {}
    ~deletable_facet() {}
};

std::locale::global(std::locale(""));
std::wstring_convert<
    deletable_facet<std::codecvt<wchar_t, char, std::mbstate_t>>> wconv;
std::wstring wstr = wconv.from_bytes(data);

However, when I try to run this, I get an range_error thrown from wstring_convert. I did some googling around, and apparently this is what happens when wstring_convert fails to convert the string.

However, these strings are clearly perfectly able to be converted using wfstream, which should be using the same codecvt as I am using with wstring_convert. So why does wifstream work, but wstring_convert not?

And is there a way that can I convert between strings and wstrings using the system's locale?

A full example of my problem, adapted from the codecvt page, is here, and the output is:

sizeof(char32_t) = 4
sizeof(wchar_t)  = 4
The UTF-8 file contains the following UCS4 code points: 
U+007a
U+00df
U+6c34
U+1f34c
The UTF-8 string contains the following UCS4 code points: 
U+007a
U+00df
U+6c34
U+1f34c
terminate called after throwing an instance of 'std::range_error'
  what():  wstring_convert
Aborted (core dumped)

Aucun commentaire:

Enregistrer un commentaire