I'm writing some code that needs to convert between byte strings and wide strings, using the system locale. When reading from a file, this is incredibly easy to do. I can use std::wifstream, imbue it with std::locale(""), and then just use std::getline.
According to cppreference's codecvt page, wifstream just uses codecvt<wchar_t, char, mbstate_t>, so I thought that I might be able to convert between std::string and std::wstring by using that as well:
// utility wrapper to adapt locale-bound facets for wstring/wbuffer
convert
template<class Facet>
struct deletable_facet : Facet
{
template<class ...Args>
deletable_facet(Args&& ...args) : Facet(std::forward<Args>(args)...) {}
~deletable_facet() {}
};
std::locale::global(std::locale(""));
std::wstring_convert<
deletable_facet<std::codecvt<wchar_t, char, std::mbstate_t>>> wconv;
std::wstring wstr = wconv.from_bytes(data);
However, when I try to run this, I get an range_error thrown from wstring_convert. I did some googling around, and apparently this is what happens when wstring_convert fails to convert the string.
However, these strings are clearly perfectly able to be converted using wfstream, which should be using the same codecvt as I am using with wstring_convert. So why does wifstream work, but wstring_convert not?
And is there a way that can I convert between strings and wstrings using the system's locale?
A full example of my problem, adapted from the codecvt page, is here, and the output is:
sizeof(char32_t) = 4
sizeof(wchar_t) = 4
The UTF-8 file contains the following UCS4 code points:
U+007a
U+00df
U+6c34
U+1f34c
The UTF-8 string contains the following UCS4 code points:
U+007a
U+00df
U+6c34
U+1f34c
terminate called after throwing an instance of 'std::range_error'
what(): wstring_convert
Aborted (core dumped)
Aucun commentaire:
Enregistrer un commentaire