mardi 22 septembre 2015

Repeated Zero-Width Lookahead

The other day, I started thinking of how a regex engine would be implemented, and one potential problem I came up with involves zero-width lookaheads and repetition. For example, when matching the regex (extra parentheses to avoid invalid syntax) /((?=x))*/ on the string "xx" would try to match the inner group as many times as possible. Since, starting at the beginning of the string, the ZWLA passes, it's considered a match, but doesn't consume any characters. Thus, one might believe the regex engine could enter an infinite loop.

When tested in GNU C++11, regex_match returned false.

When tested on regex101, it DID return a match.

Would this kind of regex construct be considered "ill-formed"? Or is there a standard behavior for this kind of thing?

Aucun commentaire:

Enregistrer un commentaire