ANTLR based parser passes work phase 1.

This one took a while. Almost too long, I almost dropped the whole thing at least twice. But fortunately I didn’t. So I’m proudly pronounce the ANTLR parser, which in its current status is able to satisfy the “DDT Tests – Core” suite and I’m proud of it. The first difficulty at hand was that I had no previous experience with ANTLR, indeed I had no previous experience with compiler/parser generators at all. Then, there were the problems to understand the way how ambiguous parsing rules can be resolved in ANTLR.

Then, there was the problem of getting out of sync of Bruno’s work. And there was an enormous amount of work with the AST classes to have the proper constructors in place, thus creating the AST without the descent.compiler format converter. And then I had to chew my self through 300 test cases to make them work. All done.

This work started with a few attempts before I settled the basic framework of how I’ll proceed with it. At this point of time, I can see major drawbacks of it works, but this is due to the problem that I didn’t want to invent a completely different AST management, so that all the rest of the DDT could go on as no parser replacement had occurred. This is the reason why the grammar file is so bloated with action code.

Let’s see what is yet to be done:

  • Error handling. There’s only a minimal error reporting/recovering support in the parser to get it running but it is not sufficient at all from the user’s point of view.
  • Review of all the latest and supported language features and the correct the functionality accordingly.
  • Incremental parsing. This is something I would not be able to address in any near future because it seems quite tedious piece of work. However, it seems quite reasonable to implement it, as it seems a huge possible improvement for the performance. (Let’s say, the user is has a line like this: “class Foo {}” and types in the block: “class Foo { int a; }”. In an incremental way it would cut right to the declDefs of the DefinitionClass instead of going a new round completely build the AST from scratch.)
  • Performance: At the moment the grammar file results in a huge parser code, and there are plenty of forward checks in the grammars. Perhaps some of them is avoidable with some cleverness, and the parsing is where all the performance improvement is much needed.
  • ANTLR compatible AST hierarchy. This would result some linear performance gains and some code clarity (removing the action code blocks). The problem here, when I visited this solution is that it seems ANTLR hasn’t got decent heterogeneous AST-support. All we have is a token based constructor. It could be useful though in case of operators or single-token rules, like some attributes.
  • Building the parser will need the ANTLR binaries and additional build rules. For the releases the lexer/parser sources should be attached to the code tree as they are.

All these things would be nice to have, but only the first is needed for the complete replacement of the descent.compiler.

Corresponding code here: http://code.google.com/a/eclipselabs.org/r/gyulagubacsi-ddt-contrib/source/list?name=feature-antlr-parser

Leave a comment