jeudi 1 décembre 2016

Can wstring_convert just replace invalid characters?

I am currently working on a tool to extract archives from a game for the purpose of data mining. I currently extract metadata from the archives (number of files per archive, filenames, packed/unpacked sizes, etc.) and write them to a std::wstring for further analysis. I have stumbled over an issue with converting filenames to wide characters using std::wstring_conver.

My code looks something like this now:

struct IndexEntry {
    int32_t file_id;
    std::array<char, 260> filename;
    // more fields
}

wstring foo(IndexEntry entry) {
    std::wstringstream buffer;
    std::wstring_convert<std::codecvt_utf8_utf16<wchar_t>> converter;
    buffer << entry.file_id << L'\n';
    buffer << converter.from_bytes(entry.filename.data()) << L'\n';
    // add rest of the IndexEntry fields to the stream
    return buffer.str();
}

The IndexEntry struct is filled by reading from files with a std::ifstream in binary mode. The error happens with converter.from_bytes(). Some of the filenames contain 0x81 as a character and when the converter encounters these, it throws a std::range_error exception.

Is there a way to tell wstring_convert to replace characters it can not convert with something else? Or is there a generally better way to handle this conversion?

This whole project is mostly a learning excercise. I wanted to do all internal string handling with wstring, so I can get some experience dealing with strings in different encodings. Unfortunatly I have no idea what exact encoding was used to generate these archive files.

Aucun commentaire:

Enregistrer un commentaire