I want to tokenize a std::string
using \s
(space), \r
(carriage return), \t
(tab), and \n
(new line) as delimiters, but between a pair of quotes no delimiters should be considered and no other quotes should be allowed. To achieve this, I use the following regex (represented as a raw string literal):
R"((\"[^\"]*\")|[^\n\r\s\t]+)"
which gives the following output when used as the std::regex
of a std::sregex_token_iterator
:
Test sample [Try It Online]:
#include <iostream>
#include <algorithm>
#include <iterator>
#include <regex>
int main() {
std::string text = "Quick \"\"\"\" \"brown fox\".";
std::regex re(R"((\"[^\"]*\")|[^\n\r\s\t]+)");
std::copy(std::sregex_token_iterator(text.cbegin(), text.cend(), re, 0),
std::sregex_token_iterator(),
std::ostream_iterator<std::string>(std::cout, "\n"));
}
Test output:
Quick
""
""
"brown fox"
.
This results in the inclusion of the surrounding quotes in the sub matches. Instead, I want to get rid of these surrounding quotes. To do so, I can obviously modify the iterated sub matches manually, but I wonder if it is possible and how one can achieve to eliminate the surrounding quotes using the std::regex
and the std::sregex_token_iterator
?
Aucun commentaire:
Enregistrer un commentaire