jeudi 21 octobre 2021

How to define boost tokenizer to return boost::iterator_range

I am trying to parse a file where each line is composed by attributes separated by ;. Each attribute is defined as key value or key=value, where key and value can be enclosed in double quotes " to allow for key and value containing special characters such as whitespace , equal sign = or semi-colon ;.

To do so, I use first boost::algorithm::make_split_iterator, and then, to allow for double quotes, I use boost::tokenizer.

I need to parse every key and value as a boost::iterator_range<const char*>. I tried coding as the code below, but I am unable to build it. It might be that the definition of the tokenizer is correct, but the error comes from the printing of the iterator_range. I can provide more information if necessary.

#include <boost/algorithm/string.hpp>
#include <boost/range/iterator_range.hpp>
#include <boost/tokenizer.hpp>

boost::iterator_range<const char*> line;

const auto topDelim = boost::token_finder(
  [](const char c) { return (c == ';'); },
  boost::token_compress_on);
for (auto attrIt = make_split_iterator(line, topDelim); !attrIt.eof() && !attrIt->empty(); attrIt++) {
  std::string escape("\\");
  std::string delim(" =");
  std::string quote("\"");
  boost::escaped_list_separator<char> els(escape, delim, quote);
  boost::tokenizer<
    boost::escaped_list_separator<char>,
    boost::iterator_range<const char*>::iterator, // how to define iterator for iterator_range?
    boost::iterator_range<const char*>
  > tok(*attrIt, els);

for (auto t : tok) {
  std::cout << t << std::endl;
}

Build errors:

/third_party/boost/boost-1_58_0/include/boost/token_functions.hpp: In instantiation of 'bool boost::escaped_list_separator<Char, Traits>::operator()(InputIterator&, InputIterator, Token&) [with InputIterator = const char*; Token = boost::iterator_range<const char*>; Char = char; Traits = std::char_traits<char>]':
/third_party/boost/boost-1_58_0/include/boost/token_iterator.hpp:70:36:   required from 'void boost::token_iterator<TokenizerFunc, Iterator, Type>::initialize() [with TokenizerFunc = boost::escaped_list_separator<char>; Iterator = const char*; Type = boost::iterator_range<const char*>]'
/third_party/boost/boost-1_58_0/include/boost/token_iterator.hpp:77:63:   required from 'boost::token_iterator<TokenizerFunc, Iterator, Type>::token_iterator(TokenizerFunc, Iterator, Iterator) [with TokenizerFunc = boost::escaped_list_separator<char>; Iterator = const char*; Type = boost::iterator_range<const char*>]'
/third_party/boost/boost-1_58_0/include/boost/tokenizer.hpp:86:33:   required from 'boost::tokenizer<TokenizerFunc, Iterator, Type>::iter boost::tokenizer<TokenizerFunc, Iterator, Type>::begin() const [with TokenizerFunc = boost::escaped_list_separator<char>; Iterator = const char*; Type = boost::iterator_range<const char*>; boost::tokenizer<TokenizerFunc, Iterator, Type>::iter = boost::token_iterator<boost::escaped_list_separator<char>, const char*, boost::iterator_range<const char*> >]'
test.cpp:21:23:   required from here
/third_party/boost/boost-1_58_0/include/boost/token_functions.hpp:188:19: error: no match for 'operator+=' (operand types are 'boost::iterator_range<const char*>' and 'const char')
  188 |           else tok+=*next;
      |                ~~~^~~~~~~

Aucun commentaire:

Enregistrer un commentaire