jeudi 28 janvier 2021

Detecting Unicode in files in Windows 10

Now Windows 10 Notepad does not require unicode files to have the BOM header and it does not encode the header by default. This does break the existing code that checks the header to determine Unicode in files. How can I now tell in C++ if a file is in unicode? Source: https://www.bleepingcomputer.com/news/microsoft/windows-10-notepad-is-getting-better-utf-8-encoding-support/

The code we have to determine Unicode:

int IsUnicode(const BYTE p2bytes[3])
{
        if( p2bytes[0]==0xEF && p2bytes[1]==0xBB p2bytes[2]==0xBF) 
            return 1; // UTF-8
        if( p2bytes[0]==0xFE && p2bytes[1]==0xFF)
            return 2;  // UTF-16 (BE)
        if( p2bytes[0]==0xFF && p2bytes[1]==0xFE) 
            return 3; // UTF-16 (LE)
            
        return 0;
}

Aucun commentaire:

Enregistrer un commentaire