jeudi 6 février 2020

Unable to extract Unicode symbols from C++ std::string

I am looking to read a C++ std::string, then passing that std::string to a function which would analyse it, then extract Unicode symbols & simple ASCII symbols from it.

I searched many tutorials online, but all of them mentioned that standard C++ does not fully support Unicode format. Many of them mentioned to use ICU C++.

This is my C++ program for understanding the very basic of above functionalities. It reads the raw string, converts to ICU Unicode String & prints that:

#include <iostream>
#include <string>
#include "unicode/unistr.h"

int main()
{
    std::string s="Hello☺";
    // at this point s contains a line of text
    // which may be ANSI or UTF-8 encoded

    // convert std::string to ICU's UnicodeString
    icu::UnicodeString ucs = icu::UnicodeString::fromUTF8(icu::StringPiece(s.c_str()));

    // convert UnicodeString to std::wstring
    std::wstring ws;
    for (int i = 0; i < ucs.length(); ++i)
      ws += static_cast<wchar_t>(ucs[i]);

    std::wcout << ws << std::endl;
}

Expected Output:

Hello☺

Actual Output:

Hello?

Please suggest what am I doing wrong. Also suggest any alternative/simpler approaches

Thanks

Aucun commentaire:

Enregistrer un commentaire