samedi 2 mai 2015

C++ Regex: non-greedy match

I'm currently trying to make a regex which matches URL parameters and extracts them.

For example, if I got the following parameters string ?param1=someValue&param2=someOtherValue, std::regex_match should extract the following contents:

  • param1
  • some_content
  • param2
  • some_other_content

After trying different regex patterns, I finally built one corresponding to what I want: std::regex("(?:[\\?&]([^=&]+)=([^=&]+))*").

If I take the previous example, std::regex_match matches as expected. However, it does not extract the expected values, keeping only the last captured values.

For example, the following code:

std::regex paramsRegex("(?:[\\?&]([^=&]+)=([^=&]+))*");
std::string arg = "?param1=someValue&param2=someOtherValue";
std::smatch sm;

std::regex_match(arg, sm, paramsRegex);
for (const auto &match : sm)
   std::cout << match << std::endl;

will give the following output:

param2
someOtherValue

As you can see, param1 and its value are skipped and not captured.

After searching on google, I've found that this is due to greedy capture and I have modified my regex into "(?:[\\?&]([^=&]+)=([^=&]+))\\*?" in order to enable non-greedy capturing.

This regex works well when I try it on rubular but it does not match when I use it in C++ (std::regex_match returns false and nothing is captured).

I've tried different std::regex_constants options (different regex grammar by using std::regex_constants::grep, std::regex_constants::egrep, ...) but the result is the same.

Does someone know how to do non-greedy regex capture in C++?

Aucun commentaire:

Enregistrer un commentaire