I'm writing some code that needs to convert between byte strings and wide strings, using the system locale. When reading from a file, this is incredibly easy to do. I can use std::wifstream
, imbue it with std::locale("")
, and then just use std::getline
.
According to cppreference's codecvt page, wifstream
just uses codecvt<wchar_t, char, mbstate_t>
, so I thought that I might be able to convert between std::string
and std::wstring
by using that as well:
// utility wrapper to adapt locale-bound facets for wstring/wbuffer
convert
template<class Facet>
struct deletable_facet : Facet
{
template<class ...Args>
deletable_facet(Args&& ...args) : Facet(std::forward<Args>(args)...) {}
~deletable_facet() {}
};
std::locale::global(std::locale(""));
std::wstring_convert<
deletable_facet<std::codecvt<wchar_t, char, std::mbstate_t>>> wconv;
std::wstring wstr = wconv.from_bytes(data);
However, when I try to run this, I get an range_error
thrown from wstring_convert
. I did some googling around, and apparently this is what happens when wstring_convert
fails to convert the string.
However, these strings are clearly perfectly able to be converted using wfstream
, which should be using the same codecvt
as I am using with wstring_convert
. So why does wifstream
work, but wstring_convert
not?
And is there a way that can I convert between string
s and wstring
s using the system's locale?
A full example of my problem, adapted from the codecvt page, is here, and the output is:
sizeof(char32_t) = 4
sizeof(wchar_t) = 4
The UTF-8 file contains the following UCS4 code points:
U+007a
U+00df
U+6c34
U+1f34c
The UTF-8 string contains the following UCS4 code points:
U+007a
U+00df
U+6c34
U+1f34c
terminate called after throwing an instance of 'std::range_error'
what(): wstring_convert
Aborted (core dumped)
Aucun commentaire:
Enregistrer un commentaire