vendredi 27 février 2015

File parsing done right

I am trying to learn how to make good parsers which:



  • Are easy to implement

  • Are easy to maintain

  • Are easy to extend (to add new features)


I've been reading several documents on the differences between lexer and parsers, if I understood correctly, the lexer is somewhat just responsible of reading and creating tokens which will be passed through the parser.


In that example, let's take the .ini narrowed to very basic syntax where one line is a value, a section or a comment


Example:



# this is a comment
[section]
option = value
option-2 = "quoted value"


My naive interpretation makes use of the following tokens:



  • CommentToken: when you match '#'

  • BeginSection: when you match '['

  • EndSection: when you match ']',

  • Assign: when you match '='

  • Quote: when you match '"'

  • Word: when you match an entire word (no space)


First question, when is the lexer responsability left to the parser?


If the user forgot to close a quoted value (e.g option = "abc), is the lexer responsible of throwing an error?


Would you either:



  • push Word, Assign, Quote, Word

  • throw an error in the lexical analisys process


And if a quoted value has the '[' character (e.g option = "hello [myname]"), is the lexer responsible of not pushing BeginSection because we are in a quoted value?


Second question, what is the best OO approch in C++11 for parsing the produced tokens?


When the lexical analysis has filled a stack of tokens, what will be the best object-oriented approche to process them?


In a more procedural fashion, you would push something like a union which have its parameters set to the corresponding token type.


Aucun commentaire:

Enregistrer un commentaire