jeudi 14 mai 2015

regex with all components optionals, how to avoid empty matches

I have to process a comma separated string which contains triplets of values and translate them to runtime types,the input looks like:

"1x2y3z,80r160g255b,48h30m50s,1x3z,255b,1h,..."

So each substring should be transformed this way:

"1x2y3z"      should become Vector3 with x = 1,  y = 2,   z = 3
"80r160g255b" should become Color   with r = 80, g = 160, b = 255
"48h30m50s"   should become Time    with h = 48, m = 30,  s = 50

The problem I'm facing is that all the components are optional (but they preserve order) so the following strings are also valid Vector3, Color and Time values:

"1x3z" Vector3 x = 1, y = 0, z = 3
"255b" Color   r = 0, g = 0, b = 255
"1h"   Time    h = 1, m = 0, s = 0

The regex I'm using is the one below:

((?:\d+A)?(?:\d+B)?(?:\d+C)?)

The A, B and C are replaced with the correct letter for each case, the expression works almost well but it gives twice the expected results (one match for the string and another match for an empty string just after the first match), for example:

"1h1m1s" two matches [1]: "1h1m1s" [2]: ""
"11x50z" two matches [1]: "11x50z" [2]: ""
"11111h" two matches [1]: "11111h" [2]: ""

This isn't unexpected... after all an empty string matches the expression when ALL of the components are empty; so in order to fix this issue I've tried the following:

((?:\d+[ABC]){1,3})

But now, the expression matches strings with wrong ordering or even repeated components!:

"1s1m1h" one match, should not match at all! (wrong order)
"11z50z" one match, should not match at all! (repeated components)
"1r1r1b" one match, should not match at all! (repeated components)

As for my last attempt, I've tried this variant of my first expression:

^((?:\d+A)?(?:\d+B)?(?:\d+C)?)$

And it works better than the first version but it still matches the empty string plus I should first tokenize the input and then pass each token to the expression in order to assure that the test string could match the begin (^) and end ($) operators.

I've also tried to use lookahead, but I don't uderstand how it works and I've gived up in this attempt.

So the question is: Is there any regular expression which matches three triplet values in a given order where all component is optional but should be composed of at least one component?

The regex tool I'm using is the C++11 one.

Aucun commentaire:

Enregistrer un commentaire