vendredi 12 janvier 2018

Is ```std::regex``` always locale aware?

On http://ift.tt/29fwzID one of the flags, for the constructor of a std::regex is collate, which specifies that ``Character ranges of the form "[a-b]" will be locale sensitive''. This indicate, to me, that std::regex is not, by default, (entirely) locale-aware. I can't find anything that claims that it explicitly is locale-aware, but then we have std::regex_traits which sort of indicates that there is some locale-awareness going on.

To what extend is std::regex locale-aware. Is it possible to read a UTF-8 string and store it in a plain std::string and just use regex classes such as [:w:] and [:punct:]? Specifically, [:w:] might be a problem. [:punct:] is not important.

This is for a c++ library that must work on macOS (which has UTF-8 locales) and for Windows (which, as far as I can tell, does not).

Aucun commentaire:

Enregistrer un commentaire