vendredi 4 décembre 2015

Parsing code files faster

I wrote a fairly complex parser for a stack-based language which loads a file into memory and then proceeds by comparing tokens to see if it is recognized as operand or instruction.

Every time I have to parse a new operand/instruction I std::copy the memory from the file buffer to a std::string and then do a `

if(parsed_string.compare("add") == 0) { /* handle multiplication */} 
else if(parsed_string.compare("sub") == 0) { /* handle subtraction */ } 
else { /* This is an operand */ }

unfortunately all these copies are making the parsing slow.

How should I handle this to avoid all these copies? I always thought I didn't need a tokenizer since the language itself and the logic is pretty simple.

Edit: I'm adding the code where I get the copies for the various operands and instructions

  // This function accounts for 70% of the total time of the program
  std::string Parser::read_as_string(size_t start, size_t end) {

    std::vector<char> file_memory(end - start);
    read_range(start, end - start, file_memory);
    std::string result(file_memory.data(), file_memory.size());
    return std::move(result); // Intended to be consumed
  }

  void Parser::read_range(size_t start, size_t size, std::string& destination) {

    if (destination.size() < size)
      destination.resize(size); // Allocate necessary space

    std::copy(file_in_memory.begin() + start,
      file_in_memory.begin() + start + size,
      destination.begin());
  }

Aucun commentaire:

Enregistrer un commentaire