mercredi 21 juillet 2021

Different behavior in C regex vs C++ regex using extended POSIX grammar

I am seeing different results when using the C POSIX regex library and the C++ standard library implementation. Here is my code:

    string pattern = "\\s";
    string testString = " ";

    regex_t cre;
    int status = regcomp(&cre, pattern.c_str(), REG_EXTENDED);
    int result = (regexec(&cre, testString.c_str(), 0, 0, 0) == 0);
    cout << "C: " << result << endl;
    
    regex re(pattern, regex_constants::extended);
    smatch sm;
    cout << "C++: " << regex_search(testString, sm, re) << endl;

The C portion successfully matches the whitespace, but the C++ one throws this error:

terminate called after throwing an instance of 'std::regex_error'
  what():  Unexpected escape character.

I understand that the string literal is escaped meaning that the actual regex that is used in pattern matching should be \s. I also only see this issue when using POSIX extended grammar. In the C++ version, if I do not specify POSIX extended grammar when constructing the regex, it defaults to ECMAScript grammar and is able to parse correctly.

What is going on here?

Aucun commentaire:

Enregistrer un commentaire