jeudi 23 février 2017

std::regex_match and lazy quantifiers with strange behavior

I know that:
Lazy quantifier matches: As Few As Possible (shortest match)

Also know that the constructor:

basic_regex( ...,
            flag_type f = std::regex_constants::ECMAScript );

And:
ECMAScript supports non-greedy matches,
and the ECMAScript regex "<tag[^>]*>.*?</tag>"
would match only until the first closing tag ... en.cppreference

And:
At most one grammar option must be chosen out of ECMAScript, basic, extended, awk, grep, egrep. If no grammar is chosen, ECMAScript is assumed to be selected ... en.cppreference

And:
Note that regex_match will only successfully match a regular expression to an entire character sequence, whereas std::regex_search will successfully match subsequences...std::regex_match


Here is my code: + Live

#include <iostream>
#include <string>
#include <regex>

int main(){

        std::string string( "s/one/two/three/four/five/six/g" );
        std::match_results< std::string::const_iterator > match;
        std::basic_regex< char > regex ( "s?/.+?/g?" );  // non-greedy
        bool test = false;
        // okay recognize the lazy operator .+?
        test = std::regex_search( string, match, regex );
        std::cout << test << '\n';
        std::cout << match.str() << '\n';
        // does not recognize the lazy operator .+?
        test = std::regex_match( string, match, regex );
        std::cout << test << '\n';
        std::cout << match.str() << '\n';
}  

and the output:

1
s/one/
1
s/one/two/three/four/five/six/g

Process returned 0 (0x0)   execution time : 0.008 s
Press ENTER to continue.


Where is the problem?
std::regex_match should not match anything and it should return 0 with non-greedy quantifier .+?

regex101

Fast test:

$ echo 's/one/two/three/four/five/six/g' | perl -lne '/s?\/.+?\/g?/ && print $&'
$ s/one/
$
$ echo 's/one/two/three/four/five/six/g' | perl -lne '/s?\/.+\/g?/ && print $&'
$ s/one/two/three/four/five/six/g


NOTE
this regex: std::basic_regex< char > regex ( "s?/.+?/g?" ); non-greedy
and this : std::basic_regex< char > regex ( "s?/.+/g?" ); greedy
have the same output with std::regex_match. Still both match the entire of the string!
But with std::regex_search have the different output.
Also s? or g? does not matter and with /.*?/ still matches the entire of the string!

More Detail

g++ --version
g++ (Ubuntu 6.2.0-3ubuntu11~16.04) 6.2.0 20160901

Aucun commentaire:

Enregistrer un commentaire