I am currently working on a tool to extract archives from a game for the purpose of data mining. I currently extract metadata from the archives (number of files per archive, filenames, packed/unpacked sizes, etc.) and write them to a std::wstring
for further analysis. I have stumbled over an issue with converting filenames to wide characters using std::wstring_conver
.
My code looks something like this now:
struct IndexEntry {
int32_t file_id;
std::array<char, 260> filename;
// more fields
}
wstring foo(IndexEntry entry) {
std::wstringstream buffer;
std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>> converter;
buffer << entry.file_id << L'\n';
buffer << converter.from_bytes(entry.filename.data()) << L'\n';
// add rest of the IndexEntry fields to the stream
return buffer.str();
}
The IndexEntry struct is filled by reading from files with a std::ifstream
in binary mode. The error happens with converter.from_bytes()
. Some of the filenames contain 0x81 as a character and when the converter encounters these, it throws a std::range_error
exception.
Is there a way to tell wstring_convert
to replace characters it can not convert with something else? Or is there a generally better way to handle this conversion?
This whole project is mostly a learning excercise. I wanted to do all internal string handling with wstring, so I can get some experience dealing with strings in different encodings. Unfortunatly I have no idea what exact encoding was used to generate these archive files.
Aucun commentaire:
Enregistrer un commentaire