ANTLR based parser passes work phase 1.

This one took a while. Almost too long, I almost dropped the whole thing at least twice. But fortunately I didn’t. So I’m proudly pronounce the ANTLR parser, which in its current status is able to satisfy the “DDT Tests – Core” suite and I’m proud of it. The first difficulty at hand was that I had no previous experience with ANTLR, indeed I had no previous experience with compiler/parser generators at all. Then, there were the problems to understand the way how ambiguous parsing rules can be resolved in ANTLR.

Then, there was the problem of getting out of sync of Bruno’s work. And there was an enormous amount of work with the AST classes to have the proper constructors in place, thus creating the AST without the descent.compiler format converter. And then I had to chew my self through 300 test cases to make them work. All done.

This work started with a few attempts before I settled the basic framework of how I’ll proceed with it. At this point of time, I can see major drawbacks of it works, but this is due to the problem that I didn’t want to invent a completely different AST management, so that all the rest of the DDT could go on as no parser replacement had occurred. This is the reason why the grammar file is so bloated with action code.

Let’s see what is yet to be done:

  • Error handling. There’s only a minimal error reporting/recovering support in the parser to get it running but it is not sufficient at all from the user’s point of view.
  • Review of all the latest and supported language features and the correct the functionality accordingly.
  • Incremental parsing. This is something I would not be able to address in any near future because it seems quite tedious piece of work. However, it seems quite reasonable to implement it, as it seems a huge possible improvement for the performance. (Let’s say, the user is has a line like this: “class Foo {}” and types in the block: “class Foo { int a; }”. In an incremental way it would cut right to the declDefs of the DefinitionClass instead of going a new round completely build the AST from scratch.)
  • Performance: At the moment the grammar file results in a huge parser code, and there are plenty of forward checks in the grammars. Perhaps some of them is avoidable with some cleverness, and the parsing is where all the performance improvement is much needed.
  • ANTLR compatible AST hierarchy. This would result some linear performance gains and some code clarity (removing the action code blocks). The problem here, when I visited this solution is that it seems ANTLR hasn’t got decent heterogeneous AST-support. All we have is a token based constructor. It could be useful though in case of operators or single-token rules, like some attributes.
  • Building the parser will need the ANTLR binaries and additional build rules. For the releases the lexer/parser sources should be attached to the code tree as they are.

All these things would be nice to have, but only the first is needed for the complete replacement of the descent.compiler.

Corresponding code here: http://code.google.com/a/eclipselabs.org/r/gyulagubacsi-ddt-contrib/source/list?name=feature-antlr-parser

LL(k) -> ASTNeoNode

I was working recently on an ANTLR based parser for the DDT project. As a phase 1. I’m trying to get the current AST hierachy working under this new parser without having to rely on the Descent parser. It isn’t finished yet, but it is at the level of progress where it worthy perhaps for other to look at it. To see what is missing here’s my sketchy list:

  • Missing import expression node in the AST
    import expression : ‘import’ ‘(‘ assignExpression ‘)’ ;
  • Clarify the function literals
  • Missing static if expression.
  • Missing static assert expression.
  • How to deal with conditional statements (version, debug, and such)?
  • Struct initializer must be implemented.
  • What is ExpIftype is for? Is that something to do with the template stuff? Or static if?
  • IsExpression in the AST somewhere? (It’s pretty tough expression btw).
    isExpression
    : is ( Type )
    | is ( Type : TypeSpecialization )
    | is ( Type == TypeSpecialization )
    | is ( Type Identifier )
    | is ( Type Identifier : TypeSpecialization )
    | is ( Type Identifier == TypeSpecialization )
    | is ( Type Identifier : TypeSpecialization , TemplateParameterList )
    | is ( Type Identifier == TypeSpecialization , TemplateParameterList )
  • Template declarations are missing in the parser rules.
  • Template instances are missing in the parser rules.
  • Proper attribute specifier implementation. (That is, accumulate all the attribute specifier to the corresponding definition).
  • Error handling and error recovery resembling to the Descent’s parser’s one.

The actual state is on my clone’s master branch here:
http://code.google.com/a/eclipselabs.org/r/gyulagubacsi-ddt-root/sour…

As I try to replace the current parser without changing much of the current state of the code (I made very few, the most obvious
modifications, such as constructors for creating AST nodes without the conversion process, and in very few places I added extra fields to currently existing nodes.), I wouldn’t add at this point any new features, or mess with the AST hierarchy.
This is my first try to create parser with a parser generator, and I admit, many places there’s need to improve the current state. Later on, we should change the ASTNeoNode hierarchy to work with on the CommonTree ANTLR object, which would eliminate the need for individual object creation as an action code. (Not sure completely how, but I think it is possible to create heterogeneous trees with ANTLR using factory pattern which in turn would need the ASTNeoNode classes and
interfaces to be more consistent as they are today. As an example of inconsistencies, at the current state some nodes are using ArrayView, others use simple arrays of objects, and so on.

Under The Bonnet: Parser, Lexer… ANTLR

Sometimes it goes like this: you start to make your well defined contribution (type inference), and as you start to assess the job to be done, you realize that some parts are bigger than you thought. So you start looking in to them time to time, and only after a while you realize that you’re step further down in the heart of the project. At this time, I fell in to the parser but for a reason: At work, I was working on a toy parser for a server with the good ol’ hand-crafted way. And I found it fun. Originally, I had the idea to  use bison, ANTLR or something similar to generate a parser, but I rejected the idea because I was afraid I couldn’t learn enough about the compiler generators to finish my task in reasonable time.

In the recent weeks, I tried to get my head around the core of the DDT.  I’ve found the current visitor accessibility unfortunately narrow for semantic analysis, type deduction or type inference, and ran in to some artefacts in the AST design. Well, this is always good news as someone has to deal with these issues. Soon enough I found that the lack of direct parser to our AST is a source of unnecessary complexity, not to mention the possible overheads. I say possible, because I couldn’t make my self carry out detailed measures. OK, OK, I’m lazy! Anyway, so the obvious choice was to look in to the question of the ANTLR parser.

Some initial experiments taught me that I shouldn’t follow Walter’s BNF-kinda description because 1. it is heavy of left-recursive rules definitions, which needs serious re-factoring, 2. there are too many differences between our current AST hierarchy and D documentation’s for explaining the language.